On comment docs

Something I’m looking at right now is generation of (in my case, HTML) API documentation from some simple markup format. The usual way to do this is by writing documentation markup inline in the source code, using specially formatted comments in header files.

The point

Some people argue that well-written source code should be its own documentation. Well, that’s true, it is: but it’s documentation with limited utility. Source code provides the following documentation:

  • Document, for the compiler’s benefit, the machine instructions that the compiler should generate
  • Document, for the programmer’s benefit, the machine instructions that the compiler should generate

Developing a high-level model of how software works from its source code is possible, but mentally taxing. It’s not designed for that. It would be like asking an ant to map the coastline of Africa: it can be done, but the information available is at entirely the wrong scale.

Several other approaches for high-level documentation of software systems exist. Of course, each of them is not actually the source code, but a model: a map of Africa is not actually Africa, but if you want to know what the coastline of Africa looks like then the map is a very useful model.

Comment documentation is one such model of software. Well-written comment docs explain why you might use a method or class, and how its properties or parameters help you to use it. It gives you a sense of how the classes fit together, and how you can exploit them. They’re called comment docs because they go into comments right alongside the source they document, usually marked up with particular tokens. As an example of a token, many documentation comment systems let me use a line like this:

@author Graham J. Lee

to indicate that I wrote a particular part of the project.

Of course, this documentation could go anywhere, so why put it into the header files? For a start, you already have the header files, so you’re not having to maintain two parallel hierarchies of content. Also, the proximity of the documentation to the source code means that there’s a higher probability (still not unity, but higher) that a developer who changes the intent or usage of a method will remember to update the documentation. Additionally, it means that the documentation can be no more verbose than required: anything that is obvious from the source (the names of methods and their parameters, for example) can be discovered from the source. This is not inconsistent with my earlier statement about source-as-documentation: you can easily find out a method signature without having to grub through the actual program instructions.

Finally, it means that when you’re working with the source, you have the documentation right there. This is one very common way to interact with comment documentation. The other way is to use a tool to create some friendly formatted output (for me, the goal is HTML) by reading the source files and extracting the useful information from the comments.

Doxygen

I have for the last forever (alright, three years) used Doxygen, for a couple of reasons:

  • The other people on my team at the time I adopted it were already using it
  • Since then, it has gained support for generating Xcode documentation sets, which can be viewed inside the Xcode organizer.

It’s not great, though. It’s a very complicated tool that tries to do all things for all people, so the configuration is huge and needs a lot of consideration. You can customise the look of the output using a CSS file but you pretty much need to as the default output looks like arse. Making it actually output different stuff is trickier, as it’s a C++ project and I’ve only ever learned enough C++ to customise Clang.

HeaderDoc

So I decided to fish around for alternatives. Of course, we know that Apple uses (and ships) HeaderDoc, but it uses its own comment format. Luckily, its comment format is identical to Javadoc, which is the format used by Doxygen. You can get headerdoc2html to look for the /** trigger by passing the -j flag.

Speaking of headerdoc2html, this is one of two programs in the HeaderDoc distribution. The other is gatherheaderdoc, which generates a table of contents from a collection of output files. Each of these is a well-written and well-documented perl script, which makes extension and modification super-easy.

Neither should be immediately required, in fact. The tool understands all of the Objective-C language features, and some extra bonus things like availability macros and groups of related methods. The default output basically looks like someone forgot to apply the style sheet to developer.apple.com. It’s trivial to configure HeaderDoc to use an external style sheet, and you can even create a custom template HTML file so that things appear in whatever fashion you want. Another useful configuration point is the C preprocessor, which you can get HeaderDoc to run through before interpreting the documentation.

Autogsdoc

The GNUstep project uses its own comment documentation tool called Autogsdoc. The fact that the autogsdoc documentation has buggy HTML does not fill one with confidence.

In fact, Autogsdoc’s output is unstyled XHTML 1.0, so it would be easy to style it to look more useful. Some projects that I’ve seen appear to have frames-based HTML, which is unfortunate.

Autogsdoc uses inline comment documentation, the same as the other options we’ve looked at. However, its markup is significantly different, as it uses SGML-style tags inside the documentation. It has a (well-documented, of course) Objective-C implementation with a good split of responsibilities between different classes. Hacking about on its innards shouldn’t be too hard for any motivated Cocoa coder.

Footnote

I know this has been annoying you all since the first paragraph in the section on HeaderDoc: */.

On Timeless Programming Books

Recently, the Dog Spanner wrote about Programming With Quartz, a book written at the tail end of 2005 but which is still useful to Mac developers everywhere. I have to agree, this book is still on my shelf and gets an airing every now and then when I need to do battle with custom drawing (even on iOS).

Today, I had a related moment of epiphany related to the immortal nature of a book, as I had a problem with memory allocation that saw me reaching to Mac OS X Internals: a Systems Approach by Amit Singh. This book was released just before Apple’s transition to Intel hardware, but is still for me a definitive reference on how Mac OS X works. If I need to know about the innards of HFS+, the memory allocator or anything at a similar level, it is to this book I turn.

There’s never really been anything that investigates the higher level components of Mac OS X in quite the same depth. When I need to refresh my knowledge of the UNIX APIs, I turn first to Advanced UNIX Programming, a book first published in 1985(!); but it is the 2004 edition that is still a canonical description of programming in the UNIX environment.

Notice that each of these books is around five years old, and yet still I find myself referring to them frequently. In the fast-changing world of software development, where books on the iPhone 3 SDK are woefully outdated, that’s a great achievement.

Honourable mention: a book I have on my shelf that deserves discussion here is Object Oriented Programming: An Evolutionary Approach by Brad Cox. Written in 1986, this lays out his vision for component-based software development using object oriented programming languages. In it he describes a little language he created called Objective-C and uses it to explain his vision. This book demonstrates that not all computing principles are long-lived. Today’s programming practices look nothing like the “Software ICs” of OOP, although we’re still stuck using Objective-C.