A Call for Modern Compilers

Published in ESD February 2006

By Jack Ganssle

The compiler vendors are providing us with the same old crap we've put up with for 20 years.

Snazzy IDEs are at least two decades old, though now the GUI versions are a lot prettier than old text-based DOS windowing. Source level debugging appeared about the same time. But compilers haven't improved in any significant way since then.

In the intervening years our projects have changed tremendously. SoC, buried cores, and the adoption of high powered processors with deep pipelines have made debugging harder. Embedded apps have grown from a few tens of thousands of lines of code to millions.

Yet our tools are about the same as ever.

Compilers still take the same old ins and generate the same old outs they've accepted since Fortran, the first compiled language, appeared in 1957. Big disks have (thankfully!) replaced 80 column punched cards. but that's due to the evolution of hardware, not compilers. Cheap processors did kill off batch processing (again, thankfully!) so we're working interactively with the computer. This too is a result of Moore's Law and improved operating systems.

All compilers still process plain old unformatted text source files. Once upon a time that made a lot of sense. Punched cards and 60s-era printers could only manage fixed-font upper-case characters. Even into the 70s most of us banged away on ASR-33 teletypes which generated upper-case printout on rolls of paper more like the Dead Sea Scrolls than the modern 8.5 x 11 or A4 documents ejected at furious rates from today's laser printers. Remember filing those yellow mounds of output? Accordion-style folds that never quite lined up yielded ugly and hard-to-manage source listings that we stuffed into filing cabinets.

But those of us working with Microsoft products haven't created a document using fixed-fonts since 1992 when they finally released a version of Windows that worked reasonably well. Apple devotees got Write, a word-processor much like any used today, a decade earlier.

Those tools are useless when creating source files, of course. Compilers still only accept plain old unformatted text. That's just plain dumb.

Why can't we format source code? I want italics to emphasize certain comments, bold-faced type where something must stand out, and various headings to break up sections of code. Sometimes a font change can help describe what the machine we're building does.

Different styles immediately come to mind: the code and the comments probably should look different. Then I might want to use another style for assert macros, and yet one more for Lint directives.

But no, that's impossible since compilers only accept plain text files.

Why don't the tools spell-check my comments as I type (as does pretty much any word processor)? Give me word-wrap for comments so I don't have to edit all those CR/LFs every time I make an edit! In my opinion the comments are as important as the code, so grammar checking is as important as syntax verification.

Why can't we document the transfer equations of a control algorithm with super- and sub-scripts? Clarity is our goal, and a formula written using asterisks for exponentiation is hard to read.

Shouldn't summation signs look like S instead of some laboriously-constructed non-standard text-only description?

One reason it's so hard to change code without injecting errors is because the comments are mind-numbing text that looks just like the C itself. Like driving on a long desert road our eyes glaze over and we miss important warnings and notes about possible interactions. DANGER: The following three lines of code are highly optimized for speed and must never be changed leaps out at the maintainer in a way fixed fonts never will.

Give me reviewing tools that support collaboration. If enabled, one can see the changes made to the file by yourself or other authors, and it's easy to insert notes that appear in a different color and different font that explain why a change was made. Or that ask for approval for a change. Word processors have had this fantastic feature for years. But no, we rely on poorly-maintained comments and another off-line tool (the version control system) that gets updated only on check-in. So most change descriptions are terse and incomplete.

Frankly, I think one reason comments are so bad today is that they're ugly and dull. Dress `em up visually and developers will be more inclined to get them right.

Linkages

For nearly 15 years we've lived in a hyperlinked world. A single click takes us to sites and information scattered on a vast planet-wide network of computers. Everyone links everything everywhere.

Except in source code, which does not recognize the nature of a hyperlink. Ironically, there's an awful lot of code written to implement links. We give this tool, the ability to jump at will through cyberspace, to the entire world yet don't use it in our own work.

Commenting is truly an art form, as is any sort of writing. Masters know how to balance content contained in the source modules and that kept in other files. Do we document the intricate details of a complex algorithm in the comments or refer the reader to the original research paper? Judicious use of hyperlinks can tie the code to outside reference material.

Requirements traceability, mandated in many safety-critical applications, means you identify which code satisfies each requirement. A comment might say "The following meets requirement 14.3.2A." If we could add a hyperlink to section 14.3.2A of the requirements document the developer could instantly pop up the relevant section in another window to compare the code and the spec.

Sometimes I just want to hyperlink inside a single file. The IDE should generate a table of contents at the top of each module with links to each function, interactively, as we work. Though we do sometimes have class browsers that give some of this capability, they analyze the program once it's written, not as it's being created.

(I wonder if a new sort of language, one built of links, makes sense. A function call is nothing more than a real-time link, after all.)

One reason software is so difficult is that we're presented with a tiny view of a huge structure. It's like the old story of the blind man trying to identify an elephant. Hyperlinks bring the rest of the code and the rest of the project's documentation immediately and easily into view.

File Formats

How do we store all of this information? Text files are inadequate, which is why word processors use a variety of open and proprietary file formats to encode vast amounts of meta-data that go far beyond the words one sees on the screen.

If a vendor decides to create a format to give us all the capability I've described, well, the wise developer will run, fast, to another IDE purveyor. Don't get locked into one particular environment. Vendors go out of business or are assimilated into other organizations. Proprietary formats increase the risk that we won't be able to maintain the code years or decades hence.

Though we need a standard for source files, one as versatile as those used by word processors, I think it would be a mistake for the IEEE or other body to invent a new standard when so many extant ones work so well today.

A couple of standards already exist.

The OpenDocument Format (ODF) is a non-proprietary file format based on the XML format originally created for the Open Office suite of desktop tools.

XML, or Extensible Markup Language, looks vaguely like HTML and is a descriptive of data as is Microsoft's forever-changing .doc format. The files are text-based (though may be compressed). So those precious libraries comprising 100s of megs of source code will be readable next year. and next century. Not many binary formats can make the same claim.

It's seen as an alternative to closed formats such as Microsoft's .doc, .ppt and .xls versions. ODF is gaining a lot of traction; some 13 companies including IBM, Oracle, Google and Sun have made significant commitments to it. This past September, amid great controversy, the State of Massachusetts decided to standardize on it for many of the State's agencies.

Microsoft, too, is migrating to an XML-derived version. In a recent move the company is seeking to make their Office Open XML format both open and free. They're hoping to get approval in 2006 by the International Organization for Standardization. In fact, today the company's Office XP and 2003 already use a zipped XML format.

Conclusion

I'm taking the compiler vendors to task as they usually provide the entire development IDE (at least, for non-Eclipse environments). And the compiler will have to accept much more complex and much richer input files. So it's up to them to effect change.

Long ago Donald Knuth conceived the compelling idea of Literate Programming. Write the software as a story, intertwining both code and description, using a fully graphical text-processing system he called TeX. But the poor state of compiler technology forced Knuth to create tools that split the input file into separate doc and source (text, of course) outputs.

It's time to extend compilers. Let's keep both graphical docs and source code united, presented in both editing and source-level debugger windows. The function of the code will be much clearer, and the documentation will more likely stay in sync with the code.

First we need unity on a file format. That will surely happen. Vendors - and us - should be clamoring for this now

The "I" in IDE means "integrated" but that's a lie. IDEs today are not integrated. They're a motley collection of random tools (compiler, editor, linker, etc) duct-taped together to somewhat ease programming. A truly integrated environment binds the tools tightly together, performing syntax checks as one types, identifying unresolved link issues dynamically, and more.

Basic gave us those capabilities 40 years ago. It's time for our tools to catch up to the state of Office suites.