Follow @jack_ganssle

The logo for The Embedded Muse For novel ideas about building embedded systems (both hardware and firmware), join the 25,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype, no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.

By Jack Ganssle

A Million Lines of Code

Published 1/14/2008

A million lines of code. It's a number bandied about more than ever as software sizes develop overactive pituitaries. Some cell phones use upwards of five million. Vista reputedly has 50 million. Everett Dirksen once may have said: "A billion here, a billion there, pretty soon you're talking real money." Well, a million lines of code here, a million there, pretty soon you're talking about a program that is as mind boggling and incomprehensible as our national debt.

A million lines of code printed out would be 18,000 pages. That's a stack six feet tall (on typical 20 pound paper). Ironically, the listing weighs in at 180 pounds while the actual operating code is mass-free; it'll live in a fraction of a gram of silicon. Like DNA, code's human-readable description requires tremendously more mass than its actual instantiation.

A million lines of code is probably on the order of 20 million instructions, or 600 million bits. That's not far off of the 3 billions base pairs in human DNA. Unlike DNA, which has redundancies and so-called "junk" sequences, every single bit in the code must be perfect. A single error causes greater or lesser failure.

Since a typical atom is around 0.3 nm in diameter (http://hyperphysics.phy-astr.gsu.edu/hbase/particles/atomsiz.html), if one had as many atoms lined up as the number of instructions needed for a million lines of code, they would stretch 10 cm. That many Ebola viruses would stretch 15 meters.

A million lines of code is as long as 14 copies of War And Peace, 25 of Ulysses, 63 copies of The Catcher in the Rye, or 66 copies of K&R's C Programming Language.

A million lines of code is not ten times more than 100,000. It's well-known that schedules grow faster than the code. Barry Boehm estimates the exponent is around 1.35 for embedded software. So the schedule for developing a million lines of code is 22 times bigger than for 100,000 LOC.

In the March, 1996 issue of Computer Watts Humphrey published crude rules of thumb for estimating software projects. Though hardly scientific, they do give a sense of scale. Using his estimates:

A million lines of code require 40,000 pages of external documentation.

A million lines of code will typically have 100,000 bugs pre-test. Best-in-class organizations will ship with around 1k bugs still lurking. The rest of us will do worse by an order of magnitude.

A million lines of code will occupy 67 people (including testers, tech writers, developers, etc) for 40 months, or 223 person-years. Darwin needed just 1.5 person-years to write The Origin of the Species. Scale that to the 26 copies equal in length of a million lines of code, and it appears writing code is some 6 times more time-consuming than writing a revolutionary scientific tome.

A million lines of code costs $20m to $40m. That's one or two 60s-era F-4 fighter jets (in today's dollars), a tenth of an F-22, a thousand cars or more (in America), nearly 20,000 Tata Nano cars, ten million gallons of gas, seven times the inflation-adjusted cost of the Eniac, and a million times the cost of the flash chips it lives in.

Think about that last analogy: A million times the cost of the flash chips. Yet accounting screams over each added penny in recurring costs, while chanting the dual mantras "software is free," and "hey, it's only a software change."