Of all people, I have a pretty high standard for doing things right. After all, I’m probably the last person who wants to be caught being The real WTF. This makes things tricky at the Day Job, <shameless_plug>where I work on BuildMaster, a pretty cool system that streamlines and automates the entire development, build, test, and deployment processes.</shameless_plug>, as I’ll often spend an exorbitant amount of time wondering about the best way to do something. Case in point: what’s the right way to do documentation?
Just to be clear, by “documentation”, I don’t mean manuals, user guides, feature matrices, and the like. Those are produced by technical writers, who seemingly enjoy living in the third circle of hell. What I mean is UML, Data-Flow, Flow Charts, Module Breakdowns, and all other works used internally by the development organization. And expanding on the question, which of these documents are we supposed to produce, and how much documentation is needed?
Documentation Done Enterprisey
Whenever I begin a sentence with “in an ideal world,” what I usually mean is “given infinite time and resources.” And if anyone has near-infinite resources, it’s large development enterprises. Given that – and the need to transfer knowledge between the finitely-tenured resources – they must know how to do documentation right… right?
If they do, it’s certainly a well-kept secret. You see, back when I was in the Corporate IT world, I was assigned my very own project. This was a pretty big deal, seeing that projects had executive visibility and that “Papadimoulis” would actually appear in a column on an executive status report. To commemorate this momentous occasion, the Project Management Office sent me a “Project Start Kit” that outlined everything a new project manager would need to know.
Among other process outlines, the starter kit had a “documentation guide” that identified which of the several dozen types of project documentation were required (10, as it turned out) and which were recommended (18… or was it 23?). And the documents themselves were no picnic: there was a 6-page interface assessment document, a 12-page requirement checklist, an 8-page scoping worksheet, and so many more that I’ve since forgotten.
Clearly, executive visibility didn’t come cheap, and the barrage of required documentation was simply a tax. As I spent the next week pouring over the documentation templates, I couldn’t help but wonder who would ever find these useful. The more forms I filled out, the more I realized that the answer was no one. When I finally arrived at the “Database Table Design” document (which documents exactly what its title implies), I realized exactly what it was that I producing: write-only documentation.
Clearly, this isn’t documentation done right.
Let’s take a step back and consider the purpose of documentation.
Documentation aids in the understanding of the purpose and operation of software.
The key word here is “aid”. In and of itself, documentation alone is never sufficient to understand something. If you give your accountant a UML Statechart, you’ll need to prefix it with a primer on finite state automata. And you’d need to prefix that with a primer on digital logic. And then probably software engineering. And structured design. It might take her 1,000 hours – and a class or two at the community college – but eventually, she’d fully understand the purpose and operation of whatever code that UML diagram documented.
To understand a certain piece of documentation, one must have a fundamental understanding of the underlying domain. A circuit diagram is useless if you don’t know what a capacitor is, and a recipe will serve you no good if “broil” is a foreign word.
In the same regard, this fundamental understanding will render some documentation entirely useless. If one knows what databases are, and has access to a particular database, providing him with a “database table diagram” would be as pointless as giving a chef a recipe for salt water.
Clearly, some things don’t need documentation.
Actually, let me take that last statement one step further:
The operation of all computer programs can be understood without any documentation.
Of course, that is as obviously axiomatic as all things that exist are fully described. That is to say, the sheer existence of a working process, program, widget, or thing means that its specifications are fully defined. The ease of extracting and understanding these specifications varies on the complexity of the subject at hand.
- A 8’ length of 2” x 4” is just that, less planning
- The human brain is somewhat understood, and will require lots of time and knowledge to fully understand
Software fits somewhere in the middle. With or without source code, it’s just a whole bunch of simple instructions strung together and run by a computer. Any of us could take the executable machine code of any program and, given infinite time, infinite whiteboard space, and the Andrew S Tanenbaum collection, fully understand how it works. Actually, that’s a popular pastime in the second circle of hell.
While it’s theoretically possible to understand anything without documentation, there’s one thing that could make trying to understand something even worse: inaccurate documentation. While inaccurate documentation is sometimes obvious – for example, unscrew the (—) shaped screw with a Phillips-head – it’s often not that simple.
One of the most extreme examples that, sadly, I hear of too often, is Source-Code Mismatch: the code files stored in “source control” (for shops with this symptom that usually means a share drive) produce a different program than what’s in production. Almost no one expects Source-Code Mismatch, and therefore never tests for it before deploying. And when weird errors and missing functionality appear, Source-Code mismatch is one of the last considerations.
While you may have never seen that in the wild, think of all the times you relied on vendor-supplied documentation to be correct. Verification that their program operates the way they say it does is not one of the first debugging steps; it’s often the last.
Note that the reason I said “vendor supplied documentation” as opposed to “internal documentation” is because I’m presuming that, like most organizations, your internal documentation is non-existent or known to be inaccurate. While those conditions may seem to be identical, the latter might be a little helpful in understanding the overall system. After all, a 50-year old blueprint that’s known to be outdated is certainly better than no blueprints at all.
The Four Factors of the Documentation
The “danger” factor in documentation is closely related to the perceived accuracy. But in all, there are four important factors to consider about documentation.
- Completeness – a measurement of how in-depth the documentation is; a module-relation diagram is less complete than a class-relation diagram
- Accuracy – how close the documentation is to the actual thing being documented; the more mistakes and inconsistencies, the less accurate the documentation is
- Perceived Accuracy – how accurate the documentation is perceived by users of it
- Usefulness – how helpful the documentation was in facilitating understanding; no documentation is useless, whereas inaccurate documentation perceived as accurate is less than useless (harmful)
Combining these factors with some common sense leads to some interesting results.
- The less Complete documentation is, the more likely it is to be Accurate, as big changes occur less often than small changes
- Achieving Completeness and Accuracy becomes exponentially expensive, as the more complete it is, the more there is to document and the more there will be to maintain
- Usefulness increases will diminish the more Complete documentation becomes, as the highly-detailed documentation will rarely will get used
- The older the Documentation, the lower the Perceived Accuracy will be
- The lower the Perceived Accuracy, the less likely it is to be used
- Documentation becomes dangerous (i.e. less than useless) when Perceived Accuracy is greater than actual Accuracy
Are you noticing a trend? Less complete documentation is generally better all around.
Embracing Inaccuracy and Incompleteness
The immediate answer to what’s the right way to do documentation is clear: produce the least amount of documentation needed to facilitate the most understanding, and be very explicit about which documentation is to be maintained and which is to be archived (i.e., read-only and left to rot).
Too much documentation and the costs of maintenance quickly outweigh the aid it provides in understanding. Documentation that is meant to be updated but never actually is will foster a mistrust of all documentation or, worse, cause harm in misunderstanding.
Of course this “right amount, archived appropriately” depends entirely on the size of the project and the complexity of its components. A one-man web-app needs little more than loosely-defined requirements that get archived and never touched again. A behemoth maintained by dozens of developers will need more, but surprisingly not that much more, as training and shared institutional knowledge will guide developers more than documentation ever will.
When trying to decide how a particular application fits in, it’s helpful to consider that documentation generally falls into the following categories, each which has some guidelines to help decide what’s right, and when it’s right:
- Requirements Documentation are produced to capture what the business/customer believes the software should do. Of course, Acceptance testing often validates that what sounds good on paper doesn’t work as well in practice, and usually these problems are found at the last minute, leaving the requirements document woefully out of date. They’re best archived.
- Design Documentation includes things like Class Diagrams, which can be helpful for discussing initial design concepts. But like requirements, they rarely keep with the first development iteration, let alone remain relevant for future developers. As such, they too should be archived.
- Database Documentation (such as Data Dictionaries, Entity-Relation Diagrams, etc) describes the structure, relations, and purpose of tables within a database. While some of this information is readily available by looking at the code, it may not be apparent how data is represented in tables and which tables immediately relate; because this is high-level (and table concepts rarely change), this is often worth keeping as a current document, even if it’s not always up-to-date.
- Module/Feature Overview provide brief descriptions of major functionality; unless it’s immediately apparent what these are by looking at the application’s navigation, this is often worth keeping up-to-date.
- Component/Dependency Diagrams describe different code modules (DLLs, JARs, etc) and the relation between each other. Though helpful, they are generally only useful if they’re up to date, and can be detrimental if outdated.
All that being said, the most important thing you can do — by far — to help others understand the systems you develop is to design to be understandable. No amount of documentation will help the average developer understand brilliant systems like The Policy Entry System and The Customer-Friendly System. If nothing else, it will serve as a good visual aid for a future TDWTF Article.