Monday, October 26, 2015

Fuzzing with american fuzzy lop

September 22, 2015
This article was contributed by Hanno Böck

In September 2014 a serious security vulnerability that became known as Shellshock was found in Bash, which is the default shell in most Linux distributions. But it quickly turned out that the initial fix for Shellshock was incomplete. Various other related bugs were found only days after the publication, amongst them two severe vulnerabilities discovered by Michał Zalewski from the Google security team. In the blog post, Zalewski mentioned that he had found these bugs with a fuzzing tool that he wrote, which almost nobody knew back then: american fuzzy lop (afl). It was the beginning of a remarkable success story.
Fuzzing is an old technique to find bugs in software. The basic idea is simple: just feed an application lots of input data with randomly introduced errors in it and see if anything happens that would indicate a bug. The easiest thing to watch for is a crash. If an application crashes from invalid input, it is often a sign of an invalid memory access—a buffer overflow, for example. And these often are security vulnerabilities.

Between dumb and template-based fuzzers

In the past, most fuzzing tools fell into two categories: "dumb" fuzzers that take a valid input and only modify it randomly, and template-based fuzzing tools that are specific to a certain input data format. Both have their limitations. While dumb fuzzers can still be surprisingly successful, they will only find the most obvious bugs because they have no knowledge about the underlying data format. Template-based fuzzers can find more bugs, but creating them is a lot of work. They have to be adapted for every kind of input data they are applied to.
American fuzzy lop tries a new strategy. To use it, the first step is to recompile an application with a special compiler wrapper that adds assembly instrumentation code to the binary. Both Clang and GCC are supported. This instrumentation allows the fuzzer itself (afl-fuzz) to observe the code paths that a certain input file will trigger. If afl-fuzz finds an input sample that triggers a new code path, it uses that sample as a starting point for further fuzzing.
This strategy causes afl-fuzz to reach a high level of code coverage for its testing. At some point, the fuzzing process may transform an input file into one that has a certain rarely used feature enabled. The fuzzer will detect that a new code path has been triggered and the fuzzing process will create further input files that use that code path and may find bugs there. The big advantage is that the person using the fuzzing tool doesn't need to have any knowledge about this rarely used feature. The fuzzer will just find that out for itself.

Magically creating JPEG headers

An experiment by Zalewski shows how remarkably successful this strategy can be. He used a bogus file to start the fuzzing process on the djpeg tool that comes with libjpeg. After a while the fuzzer automatically created input files that contained a valid JPEG header. This does not mean that it's advisable to start a fuzzing process with bogus files. It saves time to start the fuzzing process with a valid input file. In addition, this will likely not work in all situations. For example, the fuzzer is unable to create large string sequences by chance.
American fuzzy lop has been responsible for the discovery of bugs and security vulnerabilities in many important free software package, including security-sensitive software like OpenSSL, GnuPG, and OpenSSH. Some of the vulnerabilities in the Android media framework, Stagefright, that were recently discovered by Joshua Drake [PDF] were attributed to it. In order to do this, Drake had to port Stagefright to Linux, since it was designed to only run on Android. More recently, two denial-of-service issues (CVE-2015-5722 and CVE-2015-5477), which would allow attackers to remotely crash DNS servers using BIND, were also found by american fuzzy lop.
Apart from finding new bugs, some people have experimented to see if american fuzzy lop would have been able to find certain already known bugs that were believed to be hard to discover. I was able to rediscover the Heartbleed bug by fuzzing the handshake of an old, vulnerable version of OpenSSL. One notable aspect of the bug was that it involved the heartbeat extension of TLS, which is a feature that almost nobody knew about before Heartbleed hit the news. Codenomicon, the company that found Heartbleed, also used a fuzzing tool, but their fuzzer had prior knowledge of the heartbeat extension and specifically targeted it with bogus inputs. In my experiment, I used no knowledge about this specific extension. American fuzzy lop was able to generate the correct handshake packet that would contain the extension, together with the data that would trigger Heartbleed, within six hours.
Another example of a hindsight finding was a bug in the OpenSSL function BN_sqr(), which is used to square large numbers (bignums). With some inputs, it would produce wrong results due to an error in the carry propagation. These inputs were rare corner cases—only one out of 2^128 input numbers would trigger that bug. Testing with random inputs would never have led to the discovery of such a bug. However, Ralf-Philipp Weinmann was able to rediscover this bug with the help of american fuzzy lop. In a talk given at the Black Hat conference in Las Vegas, he presented a small testing application that would compare the output of the squaring with the result of multiplying a number by itself. By definition these two calculations should produce the same output, so the application would give an assertion error if the results differed. Using that test program to recognize when BN_sqr() failed, american fuzzy lop was able to find an input that triggered the bug within an hour.
Although interesting, these results should be interpreted with caution. It is obviously easy to find a bug if one knows where to look. But they still indicate that even bugs that seem hard to find without code analysis may be susceptible to fuzz testing.

Easy to use and free software

Its novel fuzzing strategy is not the only reason for the success of american fuzzy lop. Two factors that likely play a major role are that the code is openly available under a free license—it uses the Apache 2 license—and that the tool is relatively easy to use. This separates it from many other tools in the security area. Lots of commercial tools only target a limited audience, because they are expensive or not available to the public at all. And IT security tools from the academic world often have the reputation of being hard to install and even harder to use without a background in the corresponding field.
Zalewski puts a high value on the usability of his tool and often implements recommendations from its users. While the first versions were very cumbersome to use, this has changed dramatically. Also, by now packages for it are available for most Linux distributions and BSD systems. A package for Mac OS X is also available. Currently, there is no version of american fuzzy lop for Windows or for Android.
To use american fuzzy lop, one first needs to recompile an application with the compiler wrapper shipped with afl (afl-gcc/afl-g++ or afl-clang/afl-clang++). The fuzzer needs a command-line tool that takes an input file. In most cases, libraries ship some small tools that allow parsing input files and should be suitable.
When compiling libraries, it's often advisable to statically link the library against the shipped tools, which will avoid having to do library preloading when running the executable. With software using the GNU autotools, this can usually be achieved with the configure parameter --disable-shared. Therefore the configure call should look something like this:

    ./configure --disable-shared CC=afl-gcc CXX=afl-g++

Next, the user needs one or more sample input files. It is usually advisable to create small input files; if possible, they shouldn't be larger than a few kilobytes. These need to be put into a directory. Then start the fuzzing:

    afl-fuzz -i [input_directory] -o [output_directory] ./[executable_to_fuzz] @@

The command afl-fuzz will replace @@ with the fuzzed inputs. If there is no @@ in the command line, the input will be passed via the standard input (stdin).

AddressSanitizer finds more memory access bugs

There are many more options and variants in how to use afl. One notable feature is the ability to combine the fuzzing with the use of a compiler feature called AddressSanitizer (ASan), which is part of the GCC and Clang compilers and can be enabled with the parameter -fsanitize=address. It adds detection code for invalid memory accesses to an executable. Many memory access bugs, such as out-of-bounds reads or use-after-free errors, often don't cause an application to crash. With normal fuzzing, they would pass by unnoticed. AddressSanitizer changes that and stops the application execution on every read or write to an invalid memory location. In american fuzzy lop, the use of AddressSanitizer can be enabled by setting the AFL_USE_ASAN environment variable to 1.
There are, however, some caveats. While AddressSanitizer is remarkably fast compared to other similar tools like Valgrind, it still slows down the execution significantly. It is therefore sometimes suggested to run the fuzzing process without it. The queue directory generated by afl-fuzz contains all the input samples it considered interesting because they triggered new code paths. These can then be manually tested with an ASan-compiled version of the software.
AddressSanitizer allocates a huge amount of virtual memory—on 64-bit systems, several terabytes. As it is only virtual memory and only small parts of it are actually used, this doesn't stop it from working. But american fuzzy lop limits the memory for its tested applications. One easy way to work around this is to disable the memory limit of afl-fuzz (using the parameter -m none). In rare cases, this could lead to system instabilities because some inputs may cause an application to use large amounts of memory. American fuzzy lop also ships a more elegant solution that limits the memory via control groups (cgroups).

Network fuzzing is a challenge

The biggest limitation of american fuzzy lop is that it only supports file inputs. In many cases, the most interesting pieces of code from a security perspective are the parsers for networking functions. Sometimes this limitation can be worked around. In the Heartbleed experiment mentioned earlier, it was possible to create a small tool for the OpenSSL handshake that would take a raw network packet on the command line and swap it with parts of the real handshake. This was possible because the OpenSSL API allows doing a handshake, without any real network connection, just by passing buffers between different contexts. But in many other situations it is not that easy.
Some attempts have been made to intercept networking functions by preloading a library with LD_PRELOAD that would then use a file input and pass it as networking input to an application. One such attempt, called Preeny, is publicly available. However, these attempts turned out to be relatively fragile and only work on a small fraction of real-world applications. Combining american fuzzy lop with network fuzzing in a way that works on a wide variety of applications is still an open challenge.
An active community has built a number of extensions and additional tools that can be used alongside american fuzzy lop. There are variants for Python, Rust, and Go, a special QEMU mode that allows fuzzing binary applications on Linux without having access to the source code, and many more. The developers of the LLVM compiler framework have implemented a special fuzzing mode for libraries that borrows several ideas from american fuzzy lop.
The origin of the name american fuzzy lop—a rabbit breed—can be traced back to an earlier tool Zalewski wrote: Bunny the Fuzzer. "Bunny wasn't particularly great, but when I decided to revisit the idea many years later, I felt that it's appropriate to allude to that in some way", Zalewski wrote in a post to the afl-users mailing list.
American fuzzy lop has helped redefine and revive the technique of fuzzing. In combination with AddressSanitizer, it is a powerful method to improve the quality of software and to find a lot of hidden and otherwise hard-to-find bugs. It has its limitations, but the biggest limitation is probably that it is not used widely enough yet. Almost every piece of software written in C or C++ that takes input from potentially dangerous sources can and should be tested using american fuzzy lop.
[The author started The Fuzzing Project last year, in an effort to fuzz as many free software applications as possible. His work for The Fuzzing Project is supported by the Linux Foundation's Core Infrastructure Initiative.]

Thursday, October 22, 2015

A small British firm shows that software bugs aren't inevitable

The Exterminators
A small British firm shows that software bugs aren't inevitable

Peter Amey was an aeronautical engineer serving in the United Kingdom's Royal Air Force in the early 1980s when he found a serious flaw in an aircraft missile-control system being deployed at the time. It wasn't a defect in any of the thousands of mechanical and electronic parts that constituted the system's hardware. The problem was in the system's software. Amey found an erroneous piece of program code--a bug [see photo, "Checking Code"]. Because of it, the unthinkable could happen: under rare circumstances, a missile could fire without anyone's having commanded it to do so.
Amey says his superiors, rather than commending his discovery, complained that it would delay the system's deployment. Like most project managers, they didn't like the idea of fixing errors at the end of the development process. After all, good design ought to keep errors out in the first place. Yet time and again, Amey knew, the software development process didn't prevent bugs; it merely put off dealing with them until the end. Did it have to be that way? Or could developers avoid bugs in the first place? He would find the answer to be "yes" when, years later, he joined Praxis High Integrity Systems [see photo, " Bug Killer"].
Praxis, headquartered in Bath, 2 hours from London by car, was founded in 1983 by a group of software experts who firmly believed they could put together a sound methodology to ruthlessly exterminate bugs during all stages of a software project.
At the time, the software world was in a malaise that it hasn't fully shaken even today [see "Why Software Fails," in this issue]. Software projects were getting larger and more complex, and as many as 70 percent of them, by some estimates, were running into trouble: going over budget, missing deadlines, or collapsing completely. Even projects considered successful were sometimes delivering software without all the features that had been promised or with too many errors--errors that, as in the missile-firing system, were sometimes extremely serious. The personal computer era, then just starting, only reinforced a development routine of "compile first, debug later."
Praxis armed itself not only with an arsenal of the latest software engineering methods but also with something a little more unusual in the field: mathematical logic. The company is one of the foremost software houses to use mathematically based techniques, known as formal methods, to develop software.
Basically, formal methods require that programmers begin their work not by writing code but rather by stringing together special symbols that represent the program's logic. Like a mathematical theorem, these symbol strings can be checked to verify that they form logically correct statements. Once the programmer has checked that the program doesn't have logical flaws, it's a relatively simple matter to convert those symbols into programming code. It's a way to eliminate bugs even before you start writing the actual program.
Praxis doesn't claim it can make bug-free software, says Amey, now the company's chief technical officer. But he says the methodology pays off. Bugs are notoriously hard to count, and estimates of how common they are vary hugely. With an average of less than one error in every 10 000 lines of delivered code, however, Praxis claims a bug rate that is at least 50--and possibly as much as 1000--times better than the industry standard.
Praxis is still a small, lonely asteroid compared to the Jupiter-size companies that dominate the software universe--companies like Microsoft, Oracle, and SAP. The tiny British software house doesn't make products for the masses; it focuses on complex, custom systems that need to be highly reliable. Such mission-critical systems are used to control military systems, industrial processes, and financial applications, among other things.
Sometimes the software needs to work 99.999 percent of the time, like an air-traffic control program Praxis delivered some years ago. Sometimes it needs to be really, really secure, like the one Praxis recently developed for the National Security Agency, the supersecret U.S. signals intelligence and cryptographic agency, in Fort Meade, Md.
And though Praxis employs just 100 people, its name has become surprisingly well known. "They're very, very talented, with a very different approach," says John C. Knight, a professor of computer science at the University of Virginia and the editor in chief of IEEE Transactions on Software Engineering. Praxis's founders, he says, believed that building software wasn't as hard as people made it out to be. "They thought, it isn't rocket science, just very careful engineering."
Watts S. Humphrey, who once ran software development at IBM and is now a fellow at the Software Engineering Institute at Carnegie Mellon University, in Pittsburgh, also speaks highly of Praxis. He says the company's methodology incorporates things like quality control that should be more widely used in the field. In fact, Humphrey spent this past summer at Praxis headquarters to learn how they do things. He wants to use that knowledge to improve a complementary methodology he developed to help organizations better manage their software projects.
Praxis's approach, however, isn't perfect and isn't for everybody. Formal methods obviously are no silver bullet. For one thing, using formal methods can take more time and require new skills, all of which can mean higher up-front costs for a client. In fact, Praxis charges more--50 percent more in some cases--than the standard daily rate. To this its engineers will say: "You get what you pay for; our bug rate speaks for itself."
And although formal methods have been used to great effect in small and medium-size projects, no one has yet managed to apply them to large ones. There's some reason to think no one ever will, except perhaps in a limited fashion. Nevertheless, even though the company may not have all the answers to make software projects more successful, those working in the field can learn plenty of lessons from it, say advocates like Knight and Humphrey.

Software was conceived as a mathematical artifact in the early days of modern computing, when British mathematician Alan Turing formalized the concept of algorithm and computation by means of his now famous Turing Machine, which boils the idea of a computer down to an idealized device that steps though logical states.
But over time, software development gradually became more of a craft than a science. Forget the abstractions and the mathematical philosophers. Enter the realm of fearless, caffeinated programmers who can churn out hundreds of lines of code a day (often by hastily gluing together existing pieces of code). The problem is that for some projects, even tirelessness, dedication, and skill aren't good enough if the strategy is wrong.
Large, complex software systems usually involve so many modules that dealing with them all can overwhelm a team following an insufficiently structured approach. That's especially true of the mission-critical applications Praxis develops, as well as of large enterprise resource-planning systems, of the sort used by Fortune 500 companies, and complex data-driven software, such as the FBI's Virtual Case File project [see "/sep05/1203 Who Killed the Virtual Case File?" in this issue].
Even when you break such a big program down into small, seemingly manageable pieces, making a change to one turns out to affect 10 others, which may in turn affect scores or maybe even hundreds of other pieces. It may happen that making all the fixes will require more time and money than you have. If your system's correct operation depends on those changes, you'll have to either admit defeat or scramble to find a way to salvage what you've done so far, perhaps by giving up on some of the capabilities or features you'd hoped to have in the software.
As it turns out, complete failure--projects canceled before completion--is the fate of 18 percent of all information technology projects surveyed in a 2004 study by consultancy Standish Group International Inc., in West Yarmouth, Mass. Apparently that's the good news; the rate 10 years ago, according to Standish, was 31 percent.
Still, the overall picture is pretty bleak. Standish asserts that more than 50 percent of the thousands of projects it surveyed faced problems, from being turned over without significant features to going well beyond their deadlines or budgets. In the end, according to the Standish numbers, only 28 percent of projects could be considered successes by any rigorous definition.
Standish's numbers, however, are far from universally accepted in the computer industry. For contract software projects, more specifically, other analyses in recent years have put the success rate as low as 16 percent and as high as 62 percent. Nevertheless, even using those numbers as a guide, it's hard not to see the contract software business as anything but an enterprise all too often mired in mediocrity. As one study by consultant Capers Jones, in Marlborough, Mass., put it: "Large software systems...have one of the highest failure rates of any manufactured object in human history."
Today, ever more sophisticated tools are available to help companies manage all aspects of their software projects. These tools help conceptualize and design the system; manage all people, files, computers, and documents involved; keep track of all versions and changes made to the system and its modules; and automate a number of tests that can be used to find system errors.
Indeed, worldwide sales of such software development tools, according to Stamford, Conn.based market research firm Gartner Inc., generate more than US $3 billion a year. Rational Software Corp., a company acquired by IBM in 2002 for $2.1 billion, is currently the market leader, followed by Microsoft, Computer Associates International, Compuware, Borland, and others, according to Gartner.
But the effect of widespread use of these tools on overall software quality hasn't been gauged in a detailed or rigorous way. Some would even argue that the sector is a little reminiscent of the market for diet products: it, too, is a billion-dollar industry, and yet, somehow, obesity as a public health problem hasn't gone away. And, just as the few successful diet strategies all seem to require a major change in lifestyle, perhaps, too, the software failure rates won't improve significantly without a basic and widespread shift in tactics.
Certainly, Praxis's experience supports that idea. Consider one of the company's recent projects, for Mondex International Inc., a financial services company founded in the UK that is now a subsidiary of MasterCard International Inc. First, a little background. Mondex had a product called an electronic purse, a credit cardlike payment card that stored electronic cash. That is, it did not debit a bank account or draw on a line of credit; it stored the cash digitally in an embedded chip. Mondex wanted to make the card flexible enough to run a variety of applications that would keep track not only of electronic cash but also of discount coupons, loyalty reward points, and other items still unimagined.
The critical issue was to make sure that only cards with legitimate applications would work; any other card, even if programmed to pass as a Mondex card, would be deemed invalid. The solution Mondex chose was to use a special program, known as a certification authority, that would run on a central computer at the company's headquarters. The certification authority would generate unique digital certificates--long strings of numbers--to accompany all applications on the cards. That way, a card reader at, say, a store could validate a card's certificates by running them through a series of mathematical operations that would prove unequivocally that they came from Mondex.
Mondex hired Praxis to develop the certification authority, which was the most critical part of the whole system. After all, if the security of one card was broken, then just that one card could be forged. But compromising the certification authority would allow mass forgery of cards.

The Praxis team began working on the solution in late 1998. The first step was to hammer out what, precisely, the Mondex system was supposed to do--in software jargon, the system's requirements. These are essentially English-language bullet points that detail everything the program will do but not how it will be done.
Getting the requirements right is perhaps the most critical part of Praxis's methodology. For that reason, Praxis engineers held many exhaustive meetings with the people from Mondex, during which they tried to imagine all possible scenarios of what could happen. As Praxis does for all its projects, it insisted that Mondex make available not only its IT people but everyone who would have any involvement with the product--salespeople, accountants, senior managers, and perhaps even the CEO. "We focus very hard on identifying all stakeholders, everybody that cares," says Roderick Chapman, a principal engineer at Praxis [see photo, " Spark Maker"].
To make sure Praxis was on target with the system requirements, it devised a prototype program that simulated the graphical interface of the proposed system. This prototype had no real system underlying it; data and commands entered through the interface served only to check the requirements. In fact, Praxis made no further use of the prototype--the real graphical interface would be developed later, using much more rigorous methods. In following this approach, Praxis was complying with an edict from Frederick P. Brooks's 1975 classic study of software development, The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley, 2nd edition, 1995):
In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done....
Hence plan to throw one away; you will, anyhow.
Once Praxis's engineers had a general idea of what the system would do, they began to describe it in great detail, in pages and pages of specifications. For example, if a requirement said that every user's action on the system should produce an audit report, then the corresponding specification would flesh out what data should be logged, how the information should be formatted, and so on.
This is the first math-intensive phase, because the specifications are written mostly in a special language called Z (pronounced the British way: "zed"). It's not a programming language--it doesn't tell a computer how to do something--but it is a formal specification language that expresses notions in ways that can be subjected to proof. Its purpose is simple: to detect ambiguities and inconsistencies. This forces engineers to resolve the problems right then and there, before the problems are built into the system.
Z, which was principally designed at the University of Oxford, in England, in the late 1970s and early 1980s, is based on set theory and predicate logic. Once translated into Z, a program's validity can be reviewed by eye or put through theorem-proving software tools. The goal is to spot bugs as soon as possible [see sidebar, " "].

The process is time-consuming. For the Mondex project, spec-writing took nearly a year, or about 25 percent of the entire development process. That was a long time to go without producing anything that looks like a payoff, concedes Andrew Calvert, Mondex's information technology liaison for the project. "Senior management would say: 'We are 20 percent into the project and we're getting nothing. Why aren't we seeing code? Why aren't we seeing implementation?' " he recalls. "I had to explain that we were investing much more than usual in the initial analysis, and that we wouldn't see anything until 50 percent of the way through." For comparison, in most projects, programmers start writing code before the quarter-way mark.
Only after Praxis's engineers are sure that they have logically correct specifications written in Z do they start turning the statements into actual computer code. The programming language they used in this case, called Spark, was also selected for its precision. Spark, based on Ada, a programming language created in the 1970s and backed by the U.S. Department of Defense, was designed by Praxis to eliminate all expressions, functions, and notations that can make a program behave unpredictably.
By contrast, many common programming languages suffer from ambiguity. Take, for example, the programming language C and the expression "i++ * i++," in which "*" denotes a multiplication and "++" means you should increment the variable "i" by 1. It's not an expression a programmer would normally use; yet it serves to illustrate the problem. Suppose "i" equals 7. What's the value of the expression? Answer: it is not possible to know. Different compilers--the special programs that transform source code into instructions that microprocessors can understand--would interpret the expression in different ways. Some would do the multiplication before incrementing either "i," giving 49 as the answer. Others would increment the first "i" only and then do the multiplication, giving 56 as the answer. Yet others would do unexpected things.
Such a problem could not happen in Spark, says Praxis's Chapman, because all such ambiguous cases were considered--and eliminated--when the language was created. Coding with Spark thus helps Praxis achieve reduced bug rates. In fact, once Spark code has been written, Chapman says, it has the uncanny tendency to work the first time, just as you wanted. "Our defect rate with Spark is at least 10 times, sometimes 100 times lower than those created with other languages," he says.
Peter Amey explains that the two-step translation--from English to Z and from Z to Spark--lets engineers keep everything in mind. "You can't reason across the semantic gap between English and code," he says, "but the gap from English to an unambiguous mathematical language is smaller, as is the gap from that language to code."
What's more, Spark lets engineers analyze certain properties of a program--the way data flows through the program's variables, for example--without actually having to compile and run it. Such a technique, called static analysis, often lets them prevent two serious software errors: using uninitialized variables, which may inject spurious values into the program, and allocating data to a memory area that is too small, a problem known as buffer overflow.
In practice, though, not everything can be put through the mathematical wringer. Problems with the way different modules exchange data, for instance, by and large have to be solved the old-fashioned way: by thinking. Nor can Praxis completely eliminate classic trial-and-error testing, in which the programmers try to simulate every situation the software is likely to confront.
But what Praxis does do is make such simulation a last resort, instead of the main line of defense against bugs. (As famed computer scientist Edsger Dijkstra wrote, "Program testing can be used to show the presence of bugs, but never to show their absence!") For the Mondex project, such testing took up 34 percent of the contract time. That's in the lower end of the usual share, which typically ranges from 30 to 90 percent. Reduced efforts on testing mean huge savings that go a long way toward balancing the extra time spent on the initial analysis.
The system went live in 1999. Though it cost more up front, the contract called for Praxis to fix for free any problem--that is, any deviation from the specification--that came up in the first year of operation, a guarantee rarely offered in the world of contract software. That first year, just four defects triggered the clause. According to Chapman, three of the problems were so trivial that they took no more than a few hours to correct. Only one was functionally significant; it took two days to fix. With about 100 000 lines of code, that's an average of 0.04 faults per 1000 lines. Fault rates for projects not using formal methods, by some estimates, can vary from 2 to 10 per 1000 lines of code, and sometimes more.
For Mondex, fewer bugs meant saving money. Calvert estimates that Mondex will spend 20 to 25 percent less than the norm in maintenance costs over the lifetime of the project.

Formal methods were relatively new when Praxis started using them, and after some ups and downs, they have recently been gaining popularity. Among their leading proponents are John Rushby at SRI International, Menlo Park, Calif.; Constance Heitmeyer, at the U.S. Naval Research Laboratory's Center for High Assurance Computer Systems, Washington, D.C.; Jonathan Bowen at London South Bank University; the developers of Z at the University of Oxford and other institutions; and the supporters of other specification languages, such as B, VDM, Larch, Specware, and Promela.
In recent years, even Microsoft has used formal methods, applying them to develop small applications, such as a bug-finding tool used in-house and also a theorem-proving "driver verifier," which makes sure device drivers run properly under Windows.
But still, the perceived difficulty of formal tools repels the rank-and-file programmer. After all, coders don't want to solve logical problems with the help of set theory and predicate logic. They want to, well, code. "Few people, even among those who complete computer science degrees, are skilled in those branches of pure mathematics," says Bernard Cohen, a professor in the department of computing at City University, in London.
In every other branch of engineering, he insists, practitioners master difficult mathematical notations. "Ask any professional engineer if he could do the job without math, and you'll get a very rude reply," Cohen says. But in programming, he adds, the emphasis has often been to ship it and let the customer find the bugs.
Until formal methods become easier to use, Cohen says, Praxis and companies like it will continue to rely on clients' "self-selection"--only those users who are highly motivated to get rock-solid software will beat a path to their door. Those that need software to handle functions critical to life, limb, national security, or the survival of a company will self-select; so will those that are contractually obligated to meet software requirements set by some regulator. That's the case with many military contractors that now need to demonstrate their use of formal methodologies to government purchasers; the same goes for financial institutions. Mondex, for instance, required the approval of the Bank of England, in London, and formal methods were part of that approval.
Yet even if regulators were omnipresent, not all problems would be amenable to formal methods, at least not to those that are available now. First, there is the problem of scaling. The largest system Praxis has ever built had 200 000 lines of code. For comparison, Microsoft Windows XP has around 40 million, and some Linux versions have more than 200 million. And that's nothing compared with the monster programs that process tax returns for the U.S. Internal Revenue Service or manage a large telecom company's infrastructure. Such systems can total hundreds of millions of lines of code.
What does Praxis say about that? "The simple answer is, we've never gone that big," says Chapman. "We believe these methods should scale, but we have no evidence that they won't or that they will." So what if a client approaches Praxis with a really big project? Would the company handle it? "The key weapon is abstraction," he says. "If you can build abstractions well enough, you should be able to break things down into bits you can handle." That maxim guides every other discipline in engineering, not least the design of computer hardware. Why not apply it to software, too?

About the Author

Philip E. Ross (IEEE Member) wrote "Managing Care Through the Air" for the December 2004 issue of IEEE Spectrum. His work has also appeared in Scientific American, Forbes, and Red Herring.

Tuesday, October 13, 2015

Her Code Got Humans on the Moon—And Invented Software Itself

http://www.wired.com/2015/10/margaret-hamilton-nasa-apollo/

Margaret Hamilton wasn’t supposed to invent the modern concept of software and land men on the moon. It was 1960, not a time when women were encouraged to seek out high-powered technical work. Hamilton, a 24-year-old with an undergrad degree in mathematics, had gotten a job as a programmer at MIT, and the plan was for her to support her husband through his three-year stint at Harvard Law. After that, it would be her turn—she wanted a graduate degree in math.
But the Apollo space program came along. And Hamilton stayed in the lab to lead an epic feat of engineering that would help change the future of what was humanly—and digitally—possible.
As a working mother in the 1960s, Hamilton was unusual; but as a spaceship programmer, Hamilton was positively radical. Hamilton would bring her daughter Lauren by the lab on weekends and evenings. While 4-year-old Lauren slept on the floor of the office overlooking the Charles River, her mother programmed away, creating routines that would ultimately be added to the Apollo’s command module computer.
“People used to say to me, ‘How can you leave your daughter? How can you do this?’” Hamilton remembers. But she loved the arcane novelty of her job. She liked the camaraderie—the after-work drinks at the MIT faculty club; the geek jokes, like saying she was “going to branch left minus” around the hallway. Outsiders didn’t have a clue. But at the lab, she says, “I was one of the guys.”
Then, as now, “the guys” dominated tech and engineering. Like female coders in today’s diversity-challenged tech industry, Hamilton was an outlier. It might surprise today’s software makers that one of the founding fathers of their boys’ club was, in fact, a mother—and that should give them pause as they consider why the gender inequality of the Mad Men era persists to this day.
‘When I first got into it, nobody knew what it was that we were doing. It was like the Wild West.’ — Margaret Hamilton
As Hamilton’s career got under way, the software world was on the verge of a giant leap, thanks to the Apollo program launched by John F. Kennedy in 1961. At the MIT Instrumentation Lab where Hamilton worked, she and her colleagues were inventing core ideas in computer programming as they wrote the code for the world’s first portable computer. She became an expert in systems programming and won important technical arguments. “When I first got into it, nobody knew what it was that we were doing. It was like the Wild West. There was no course in it. They didn’t teach it,” Hamilton says.
This was a decade before Microsoft and nearly 50 years before Marc Andreessen would observe that software is, in fact, “eating the world.” The world didn’t think much at all about software back in the early Apollo days. The original document laying out the engineering requirements of the Apollo mission didn’t even mention the word software, MIT aeronautics professor David Mindell writes in his book Digital Apollo. “Software was not included in the schedule, and it was not included in the budget.” Not at first, anyhow.

But as the Apollo project unfolded, the centrality of software in accomplishing the mission started to become clear. In 1965, Hamilton became responsible for the onboard flight software on the Apollo computers. It was an exciting time, and the US was depending on the work that she was doing. But sometimes the pressure kept Hamilton up at night. Once, after a late-night party, she rushed back to the computer lab to correct a piece of code she’d suddenly realized was flawed. “I was always imagining headlines in the newspapers, and they would point back to how it happened, and it would point back

By mid-1968, more than 400 people were working on Apollo’s software, because software was how the US was going to win the race to the moon. As it turned out, of course, software was going to help the world do so much more. As Hamilton and her colleagues were programming the Apollo spacecraft, they were also hatching what would become a $400 billion industry.
For Hamilton, programming meant punching holes in stacks of punch cards, which would be processed overnight in batches on a giant Honeywell mainframe computer that simulated the Apollo lander’s work. “We had to simulate everything before it flew,” Hamilton remembers. Once the code was solid, it would be shipped off to a nearby Raytheon facility where a group of women, expert seamstresses known to the Apollo program as the “Little Old Ladies,” threaded copper wires through magnetic rings (a wire going through a core was a 1; a wire going around the core was a 0). Forget about RAM or disk drives; on Apollo, memory was literally hardwired and very nearly indestructible.
Apollo flights carried two near-identical machines: one used in the lunar module—the Eagle that landed on the moon—and the other for the command module that carried the astronauts to and from Earth. These 70-pound Apollo computers were portable computers unlike any other. Conceived by MIT engineers such as Hal Laning and Hamilton’s boss, Dick Batton, it was one of the first important computers to use integrated circuits rather than transistors. As Mindell tells the story, it was the first computerized onboard navigation system designed to be operated by humans but with “fly-by-wire” autopilot technology—a precursor to the computerized navigation systems that are now standard on jetliners.
The system stored more than 12,000 “words” in its permanent memory—the copper “ropes” threaded by the Raytheon workers—and had 1,024 words in its temporary, erasable memory. “It was the first time that an important computer had been in a spacecraft and given a lot of responsibility for the mission,” says Don Eyles, who worked on the lunar module code while at MIT’s IL. “We showed that that could be done. We did it in what today seems an incredibly small amount of memory and very slow computation speed.” Without it, Neil Armstrong wouldn’t have made it to the moon. And without the software written by Hamilton, Eyles, and the team of MIT engineers, the computer would have been a dud.
This became clear on July 20, 1969, just minutes before Apollo 11 touched down on the Sea of Tranquility. Because of what Apollo software engineer Don Eyles has termed a “documentation error,” the Apollo computer started spitting out worrying error messages during this critical phase of the mission. But here’s where the technical arguments won by Hamilton and others saved the day. The error messages were popping up because the computer was being overwhelmed, tasked with doing a series of unnecessary calculations when, in fact, it was most needed to land the module on the surface of the moon. Back in Houston, engineers knew that because of Apollo’s unique asynchronous processing, the computer would focus on the task at hand—landing the Eagle on the Sea of Tranquility. When the software realized it didn’t have enough room to do all the functions it was supposed to be doing, it went through its error detection process and focused on the highest priority job, Hamilton says.

‘That would never happen’

One day, Lauren was playing with the MIT command module simulator’s display-and-keyboard unit, nicknamed the DSKY (dis-key). As she toyed with the keyboard, an error message popped up. Lauren had crashed the simulator by somehow launching a prelaunch program called P01 while the simulator was in midflight. There was no reason an astronaut would ever do this, but nonetheless, Hamilton wanted to add code to prevent the crash. That idea was overruled by NASA. “We had been told many times that astronauts would not make any mistakes,” she says. “They were trained to be perfect.” So instead, Hamilton created a program note—an add-on to the program’s documentation that would be available to NASA engineers and the astronauts: “Do not select P01 during flight,” it said. Hamilton wanted to add error-checking code to the Apollo system that would prevent this from messing up the systems. But that seemed excessive to her higher-ups. “Everyone said, ‘That would never happen,’” Hamilton remembers.
But it did. Right around Christmas 1968—five days into the historic Apollo 8 flight, which brought astronauts to the moon for the first-ever manned orbit—the astronaut Jim Lovell inadvertently selected P01 during flight. Hamilton was in the second-floor conference room at the Instrumentation Laboratory when the call came in from Houston. Launching the P01 program had wiped out all the navigation data Lovell had been collecting. That was a problem. Without that data, the Apollo computer wouldn’t be able to figure out how to get the astronauts home. Hamilton and the MIT coders needed to come up with a fix; and it needed to be perfect. After spending nine hours poring through the 8-inch-thick program listing on the table in front of them, they had a plan. Houston would upload new navigational data. Everything was going to be OK. Thanks to Hamilton—and Lauren—the Apollo astronauts came home.
Also thanks to Hamilton and the work she led, notions of what humanity could do, and be, changed not just beyond the stratosphere but also here on the ground. Software engineering, a concept Hamilton pioneered, has found its way from the moon landing to nearly every human endeavor. By the 1970s, Hamilton had moved on from NASA and the Apollo program. She went on to found and lead multiple software companies. Today her company, Hamilton Technologies, is just a few blocks away from MIT, where her career began—a hub of the code revolution that’s still looking toward the stars.

Monday, October 12, 2015

Software Estimation is a Losing Game

https://rclayton.silvrback.com/software-estimation-is-a-losing-game

This is an argument, and like all arguments, it's supports a specific position. I don't try to defend software estimation; I leave it to my colleagues to present that case. I write about this issue because I believe the software community needs to have an internal debate about whether software estimation in its current form is appropriate for projects.

Software estimation is the process of predicting the time and effort required to accomplish development or maintenance tasks. Estimates are generally used to help customers (this could be your own organization) make decisions. Customers often want a general idea of how long it will take to accomplish tasks and since the customer is always right, we engineers have always agreed to participate in a practice that we frankly despise. We estimate because it's easier to sell a software solution with a fixed delivery time than to actually admit that the customer is simply going to get whatever product your team can complete in the allotted time. The estimate becomes our commitment to delivering a piece of software under some arbitrary deadline.
Sadly, we commit to these deadlines, despite the likelihood of missing them, and in full acknowledgement of the consequences of missing our estimates. These consequences include, but are not limited to:

Loss of credibility with your customer and possibly your own team.
Internal team friction, especially when management over-promises on a delivery.
Poor morale, particularly when the team has to work excessive hours to meet a delivery.
Low quality software if developers rush to meet deadlines and skip quality control practices like testing.
Project Failure when estimates are consistently off and the customer has completely lost faith in your team's ability to deliver.

Over the last six years, I've participated in a number of software projects where poor software estimation has led to very bad outcomes. As a result, I've spent many hours pondering, and discussing with friends, strategies that would have led these projects to greater success. Given the difficulty of estimating effort on a software project, not to mention the near-certain guarantee of getting it wrong, I've come to a radical question...

Do we need software estimation?

I've posed this question a couple of times to project managers and corporate types, particularly when I'm asked to provide some crazy estimate for a customer. "Well of course we do!" is typically the answer I get back. "A customer's not going to give us a contract without some guarantee of delivery." And the sad thing is, this response is fair. It's fair, because we have allowed it to be fair, even though we know this is not how software projects work.
In reality, software teams can only provide a set of capabilities proportional to the amount of time and effort we are allotted. In that sense, we are very much like the hospitality industry...we can't guarantee a guest they will be fully refreshed when they leave our resort, but we can assert that the longer the guest stays, the more likely they will achieve this outcome.

The cult of software estimation.

Despite our inherent understanding of the difficulties of software estimation, we continue to believe that it is the correct course of action for helping us plan and manage projects. As an industry, we've invented a host of methodologies to help us manage the uncertainty of estimation (planning poker, t-shirt sizes, etc.) and written countless books on the topic. In fact, many people have made software estimation itself a career, coaching teams and project managers on the art of predicting uncertainty. Yet, we are still just as incompetent at the process as we were three decades ago (maybe even more so).
The belief that software estimation is critical is so pervasive that it has been ingrained in project management philosophies like Agile, giving the impression that they are inseparable. This highlights what I call the “cult of software estimation”, which is literally that the belief that the importance of software estimation is above reproach [1].

The uncertainty of software estimation.

Software estimation is not an inherently bad process. The problem is, we're really bad at it. It's literally the equivalent of making a weather forecast. You wouldn't commit a forecaster to a prediction for rain a month from now (well, if you do, you're naive). Like any prediction, it's only really accurate, and useful, when making near-term forecasts.
There are a number of factors that lead to skewed software estimates. From my own experience, I've seen all of the following (but I'm sure there's more):

Tasks not granularly specified - a task is too ambiguous leading to a lot of thrashing on how it should be implemented.
Task creates new tasks for the same sprint - common with research tasks whose outcomes include some sort of solution to be built in the sprint.
Estimates made for unfamiliar technologies - trap for developers who inaccurately estimate the time to completion based on past experience with disparate platforms.
Working on unrelated tasks - i.e. context-switching; the tasks assigned to a developer are unrelated and force them to "spin up" for each one.
Tasks with in-sprint dependencies - developers become immediately blocked because precursor tasks are unfinished. This is especially problematic if the dependencies need to be finished by multiple people.
Research tasks with non-tangible outcomes - I've seen people take weeks to research/prototype some solution. If you don't constrain ambiguous tasks, you're inviting the task-taker to waste time.
One person estimating for another - common when developer is absent during planning and another does the estimation for them.
Peer pressure - engineer(s) press another to accomplish a task faster than they believe they can. This generally happens in high-stress environments when a stronger personality bullies a developer into working harder.
Poor morale: teams under a lot of pressure tend to deliver slower.
External pressure: pressure by management to estimate less/more time-to-completion because of political and financial agenda.

None of these factors should surprise you. In fact, our software estimation processes are designed to deal with them. Our solution is to apply psychology and mathematical models to develop better estimates. In fact, I feel like we waste more effort (per capita) than any other industry analyzing our own development practices seeking better efficiencies.

The resource cost of software estimation.

When you think about it, even for small teams, there is generally someone dedicated to studying things like "burn down" or normalizing software estimates by person, over time. Traditionally, software development teams are really small (2-10 people). So when you think about all the time and effort that one person has dedicated in studying the work habits of their team, it's possible you're losing a significant percentage of your resources to overhead.
More importantly, creating software estimates tends to be a group activity, typically occurring at the beginning of a sprint during some form of planning meeting. If you are dealing with a fairly complex system, or have a lot of people on the team, you'll spend anywhere from 2-8 hours planning a sprint. Consider the fact that you may be wasting entire man-days (number of people estimating * time spent estimating) of development time just coming up with estimates.
When you think about it, that's a lot of time and effort lost to come up with estimates that are going to generally be wrong.

What do we gain from software estimation?

A better appreciation for how much a team can accomplish in a set amount of time.
A roadmap that can be used to determine the time-to-completion of a project.
A nice dashboard with charts that can show individual developer accountability to management.

What do we lose from software estimation?

If the estimation is bad, the ability to maintain a realistic pace of development. You’re either burning team out or looking stupid because you finished too early.
Flexibility in planning and execution, particularly if you are planning too far out and commit to a set of tasks. What happens when your approach changes and you are going back to the customer to explain a task that was planned to take two weeks will now take 4 weeks?
Lost resources if you don’t have a significantly long project, the overhead of having someone study and normalize estimations will burden you.
Efficiency, particularly when estimation is combined with time-boxed development cycles like a sprint.

Software Estimation and Time-Boxed Development Cycles.

I would like to expand that last bullet. We typically use software estimation as a tool to help us plan our sprints. How else are we to know how many tasks we should take on in a given period of time? I contend that this practice really just promotes a host of inefficiencies in your development process:

Lost effort due to context-switching - Engineers only take as many tasks that fit within their window (say 2 weeks). This usually means the developer takes a couple of large tasks and several unrelated ones (hell they may all be unrelated). Working on unrelated tasks causes context-switching, which reduces overall efficiency since the developer has to “retool” for the new tasks.
Lost time due to fragmentation - Just like blocks in a hard drive, you can’t perfectly fit tasks into time slots (3-hour task into a half-day slot). There’s always going to be some amount of lost time. Of course, you could always reduce the “block size”, but is that even realistic (i.e. planning tasks down to the minute!)?
Discourages maximum productivity - What happens when you accomplish all of your tasks in a sprint? Some engineers may choose to pull more tasks off the backlog; others might gold plate. I’ll tell you what I might do…take a nap. Especially, if I just spent the last four months working my ass off to get a system up and running!

Why estimate?

So I ask again, why should we estimate? The only benefits rendered to our managers and customers is a prediction that really can't be trusted. The consequences, however, tend to range from reduced development efficiency, morale issues, low quality software, and potential loss of work.

What is the alternative to not estimating?

I don't have an exact methodology. Maybe it's something like Kanban, though I don't know enough about Kanban to say one way or the other. I can tell you what this paradigm might look like.
Let's call it, Budget-Driven Development, or BuDD for short. In BuDD, development teams are given a set of resources and develop what they can until the budget is gone. I’m sure I just gave someone a heart attack, but when you think about it, it’s not too unreasonable. Many contracts have fixed time and costs. You make a guarantee that the customer will get a specific number of man-hours of development. Given a prioritized backlog, developers work on the three most important items. I imagine it would work similar Pandora's prioritization model, but not be constrained by time.
You guarantee quality to your customer by:

Delivering capability often.
Allowing the customer to prioritize development.

Isn't that just Agile? - Yes! Without arbitrary estimates!
The benefit to the customer is that you’re making a guarantee that you will (hopefully) maximize their value by not delivering early and pocketing the rest of their investment. Your team will also discouraged from remaining idle because it will be immediately apparent by the substance of your deliveries.
Not estimating does not mean your are not managing your team. In fact, effective management (the kind that should be happening in the first place) becomes more important:

Team is constantly collaborating. Not estimating does not negate planning meetings, more sync’s or scrums, etc.
Backlog is managed. This includes managing dependencies and ordering and prioritizing tasks, etc.
Tasks are sized (as opposed to estimating delivery time). You should still be able to say, "this task is too big for one delivery."
Development cycles are scoped to tangible deliveries and you should favor smaller deliveries over larger ones.

Hopefully, I have convinced you (on some level) to rethink the use of software estimation. The next section discusses principals you can employ if you decide to move from estimation.

Moving towards a no-estimate methodology.

Whatever strategy you choose to employ to manage your team, I recommend at least adopting the following principals for managing your team:

Prefer granular specifications - The more granular the specification, the more accurate you can gauge the scope of a task. This means, don't settle on stories as a unit of work. Instead, break that story down into technical tasks. Use schema and contract specification technologies that explicitly define interfaces. Draw mockups. Essentially, remove the ambiguity from the task.
Choose task-oriented development cycles instead of time-based ones. This strategy minimizes context-switching and time fragmentation.
Group related tasks; prevent context switching and attempt to maximize developer flow.
Opt for smaller development cycles, particularly if more flexibility is needed. More importantly, this allows you to demonstrate your work more often and receive feedback.
Stagger your team’s responsibilities and efforts. Have the tech-lead/architect specify tasks a sprint ahead of when they are planned to be done. Test and QA should happen after a development cycle. Don't try to cram dependent tasks into the same cycle.
Choose the three most important tasks and work on them until completion. Don't waste your time with filler tasks just because you need to fill a time slot.
Stop and regroup - don’t be afraid to stop the sprint if you feel it's necessary. This might occur if you feel tasks aren’t specified well enough or if developers are thrashing (not getting work done efficiently). Take time to organize the team and then carry on.
Pivot - if things aren't working, or you feel like the team needs to go another direction, do so. The worst thing you can do in Agile development is not be agile.

Conclusion.

Whether you buy the argument that software estimation is unnecessary, or are still an ardently supporting the estimation camp, I think we can agree on a couple of ideas:

Software estimation is difficult.
Many teams fail to accurately make estimates.
Trying to meet unrealistic estimates can destroy a team's morale.
Failing to meet an estimate hurts a team's credibility.

The question is, if this is such a big problem, what should we do about it?

Comments, Notes, and Further Explanation

While you read this, you are probably thinking that I believe software estimation seriously flawed. I actually think software estimation can work. But to do it right, it takes a highly-cohesive team working in a well-known problem domain, with significant experience in the technologies being used. Even then, there's still the potential for estimation failure. For instance, think about the effects of holidays, vacation, and illness. What about significant weather or internal or external political issues. This might not happen often, but it can still cause problems. What I really believe is that it only works for about 5% of teams.

Richard Clayton –

Unrepentant Thoughts on Software and Management.

Saturday, October 10, 2015

Software That Lasts 200 Years

http://www.bricklin.com/200yearsoftware.htm

I've been following some of the writings and actions of the Massachusetts State Executive Office for Administration and Finance as it deals with its Information Technology needs. It was through listening to Secretary Kriss and reading the writings he and other Massachusetts government officials have produced that I've come to look at software development from a whole new perspective. This essay tries to present that perspective and examine some of its implications.

Many things in society are long-term

In many human endeavors, we create infrastructure to support our lives which we then rely upon for a long period of time. We have always built shelter. Throughout most of recorded history, building or buying a home was a major starting step to growing up. This building would be maintained and used after that, often for the remainder of the builder's life span and in many instances beyond. Components would be replaced as they wore out, and the design often took the wear and tear of normal living into account. As needs changed, the house might be modified. In general, though, you thought of a house as having changes measured in decades.

Likewise, human societies also create infrastructure that are built once, then used and trusted for a long period of time. Such infrastructure includes roads, bridges, water and power distribution systems, sewers, seaports and airports, and public recreational areas. These also would be used and maintained without major modifications after they were built, often for many decades or even centuries.

Software has been short-term

By contrast, software has historically been built assuming that it will be replaced in the near future (remember the Y2K problem). Most developers observe the constant upgrading and replacement of software written before them and follow in those footsteps with their creations. In the early days of computer software, the software was intimately connected to the hardware on which it ran, and as that hardware was replaced by new, better hardware, new software was built to go with it. In the early days, many uses of computing power were new -- they were the first application of software to problems that were previously done manually or not at all. The world got used to the fact that the computer version was an alternative and the special features and cost savings were what was special.

Today, hardware is capable enough that software can be written that will continue to run unmodified as hardware is changed. Computers are no longer new alternatives to other applications -- they are the only alternative. Despite this, old thinking and methodologies have remained.

Computers and computer software have been viewed as being valuable for no longer than common short-term durable goods like an automobile or sometimes even tires. In accounting, common depreciation terms for software are 3 to 5 years; 10 at most. Contrast this to residential rental property which is depreciated over 27.5 years and water mains and brick walls which are depreciated over 60 years or more.

Records

Another aspect of human society is the keeping of records. Common records kept by governments include property ownership, citizenship and census information, and laws. Personal records include images (such as portraits) and birth, death, and genealogical information. General information kept by society includes knowledge and expression, and artifacts representative of culture. Again, the time frame for keeping such records is measured in decades or centuries. I can go to city hall and find out the details of ownership relating to my house going back to when it was built in the late 1800's. "Family bible" records go back generations. The Boston Public Library, like many city libraries, has newspapers from over 200 years ago available on microfilm, and many from the last 150 years in the original paper form.

Most of these societal records have been kept on paper. When computers were first introduced, they were an adjunct to the "real" paper records, and paper printouts were made. Computer-readable "backups" and transaction logs were produced and stored on removable media such as magnetic tapes, or even paper printouts. These were usually written and then rarely accessed, and even then accessed in a manner akin to the newspaper stacks of a library. Only the recent, working copies of data were actually available to the computers on an instantaneous basis. Much of the use of computers was for "transactions", and only the totals at the end of the time period of interest needed to be carried forward except in rare circumstances, such as disaster recovery or audits. Switching to a new computer system meant copying the totals and then switching the processing of new transactions to the new system instead of the old.

When it comes to moving ahead, most new software and hardware can only access the most recent or most popular old data. Old manuscripts created with old word processors, often archived on obsolete disk cartridges in obsolete backup formats, are almost impossible to retrieve, even though they are less than 25 years old. The companies that built the software and hardware are often long gone and the specifications lost. (If you are older than 30, contrast this to your own grade school compositions saved by your parents, or letters from their parents, still readable years later.)

Today's world and Societal Infrastructure Software

The world is different now than it was even just a decade or two ago. In more and more cases, there are no paper records. People expect all information to be available at all times and for new uses, just as they expect to drive the latest vehicle over an old bridge, or fill a new high-tech water bottle from an old well's pump. Applications need to have access to all of the records, not just summaries or the most recent. Computers are involved in, or even control, all aspects of the running society, business, and much of our lives. What were once only bricks, pipes, and wires, now include silicon chips, disk drives, and software. The recent acquisition and operating cost and other advantages of computer-controlled systems over the manual, mechanical, or electrical designs of the past century and millennia have caused this switch.

I will call this software that forms a basis on which society and individuals build and run their lives "Societal Infrastructure Software". This is the software that keeps our societal records, controls and monitors our physical infrastructure (from traffic lights to generating plants), and directly provides necessary non-physical aspects of society such as connectivity.

We need to start thinking about software in a way more like how we think about building bridges, dams, and sewers. What we build must last for generations without total rebuilding. This requires new thinking and new ways of organizing development. This is especially important for governments of all sizes as well as for established, ongoing businesses and institutions.

There is so much to be built and maintained. The number of applications for software is endless and continue to grow with every advance in hardware for sensors, actuators, communications, storage, and speed. Outages and switchovers are very disruptive. Having every part of society need to be upgraded on a yearly or even tri-yearly basis is not feasible. Imagine if every traffic light and city hall record of deeds and permits needed to be upgraded or "patched" like today's browsers or email programs. Needing every application to have a self-sustaining company with long-term management is not practical. How many of the software companies of 20 years ago are still around and maintaining their original products?

Software development culture

Traditional software development falls into two general categories: Prepackaged and Custom. Prepackaged software is written by Application Software Companies (often called Independent Software Vendors, ISVs) who produce a program and then sell the same product to multiple customers. Custom software is written either by an independent company under contract or by in-house developers for a specific user. Common elements may be reused from project to project, but the overall program is unique.

Prepackaged software has the advantage of using the leverage one gets by spreading development costs over multiple users. Custom software has the advantage of being able to be tuned to very specific needs and circumstances of each user. A challenge when developing prepackaged software is developing a product that appeals to a wide audience. A challenge when developing custom software is to take advantage of "generic" prepackaged components to lower development costs.

The most successful prepackaged software applications have been those that may be inexpensively customized to meet the needs of users by developers with less and less computer skills, most desirably by the users themselves, or that form a base on which other prepackaged or custom software are built. Examples of such software are the common "productivity" applications like word processors and spreadsheets and "plumbing" software like operating systems, database engines, and web servers. The developers of prepackaged software are driven by a need to make their products appeal to today's potential users (and buyers), usually through features that distinguish them from competition.

A traditional prepackaged software company is organized as an ongoing enterprise, usually with a desire and plans for growth. An initial core of technical and product design people build the first version of the product. Marketing and sales people are added to sell the product and bring in revenues. Development continues and new, better versions are produced. New revenue comes from selling to existing customers, with each new version needing to give existing users a reason to replace the old product. The mentality, and the resulting major investments in corporate marketing, sales, and research activities, is focused on obsolescence and "upgrading" -- but only upgrading to products from that company. The potential for new customers and upgrade revenue is often a requirement to procure initial funding.

There are prepackaged software companies that are structured to make their profits from services and activities separate from the actual delivery of software code. The software itself may be available with no or little charge, but the organization is set up so that support of various sorts is provided by the company which has special knowledge of, and access to, the product. Again, there is a culture of obsolescence, to keep customers upgrading to new versions and paying for maintenance.

The needs of Societal Infrastructure Software

Let us look at the needs for societal infrastructure software. They include the following:

Meet the functional requirements of the task.

Robustness and long-term stability and security.

Transparency to determine when changes are needed and that undesired functions are not being performed.

Verifiable trustworthiness of all three of the above.

Ease and low cost of training for effective use.

Ease and low cost of maintenance.

Minimization of maintenance.

Ease and low cost of modification.

Ease of replacement.

Compatibility and ease of integration with other applications.

Long-term availability of individuals able to train, maintain, modify, determine need for changes, etc.

The structure and culture of a typical prepackaged software company is not attuned to the needs of societal infrastructure software. The "ongoing business entity" and "new version" mentality downplay the value of the needs of societal infrastructure software and are sometimes at odds.

By contrast, custom software development can be tuned better to the needs of societal infrastructure software. The mentality is more around the one-time project leaving an ongoing result, and the cost structures are sometimes such that low maintenance is encouraged. The drivers of custom software are often the eventual users themselves, paying upfront for development.

Some of the problems with custom development with regards to societal infrastructure software are the inability to spread the development and maintenance costs among a large number of customers and the narrow focus on the current requirements of the particular customer and their current stage of need (which often may change in ways visible to other customers but not yet to them).

A new style of development

What is needed is some hybrid combination of custom and prepackaged development that better meets the requirements of societal infrastructure software.

How should such development look? What is the "ecosystem" of entities that are needed to support it? Here are some thoughts:

Funding for initial development should come from the users. Bridges and water systems are usually funded by governments, not by private entities that will run them for generations. The long-term needs of the funders must be more inline with the project requirements than the investment return needs of most private sources of capital.

The projects need to be viewed as for more than one customer. A system for tracking parking tickets is needed by many municipalities. There is little need to have a different one for each. As a result, the funding should also be able to come from a combination of multiple sources. Funding or cost-sharing "cooperatives" need to exist.

The requirements for the project must be set by the users, not the developers. The long-term aspects of the life of the results must be very explicit. Best-practices must be established, tracked, and revisited.

There is the whole issue of data storage and interchange standards that is critical to the long-term success and ability to do migration. Impediments such as intellectual property restrictions and "digital rights management" chokepoints must be avoided. (Lawmakers today must realize how important data interchange and migration is to the basic needs of society. They must be careful not to pollute the waters in an attempt to deal with perceived threats to a minor part of the economy.)

Another critical issue is platform (hardware and software) independence. All development of long-term software needs to be created with the possibility of new hardware, operating systems, and other "computer infrastructure" in mind.

The actual development may be done by business entities which are built around implementing such projects, and not around long-term upgrade revenue. Other entities are needed for providing the ongoing services with a mentality of keeping existing systems running. (The two entities may or may not be related.) Many such companies already exist.

The attributes of open source software need to be exploited. This includes the transparency of the source code and the availability for modification and customization. Much has been written with regards to open source and its value for bug finding, security checking, etc., which is why this is needed. The added benefit here is that society as a whole may benefit in unforeseen ways as new applications are found for programs, be they in the private or public sector. The availability of the source code, as well as the multi-customer targeting and other aspects, enables a market for the various services needed for support, maintenance, and training as well as connected and adjunct products.

The development may be done in-house if that is appropriate, but in many cases there are legal advantages as well as structural for using independent entities. Some governmental agencies may be precluded from licensing their results under licenses that are most appropriate for the long-term health of the projects. For example, they may be required to release the program code into the public domain where it may then be improved by others (and re-released under restrictive licenses) without a return benefit to the original funders.

Unlike much of the discussion about open source, serendipitous volunteer labor must not be a major required element. A very purposeful ecosystem of workers, doing their normal scheduled work, needs to be established to ensure quality, compatibility, modifications, testing, security, etc. Educational and other institutions may be employed with the appearance of volunteer labor as students and other interested parties are used, much as courts and other governmental agencies have used interns and volunteers for other activities. The health of the applications being performed by the software must not be dependent upon the hope that someone will be interested in it; like garbage collecting, sewer cleaning, and probate court judging, people must be paid.

The ecosystem of software development this envisions is different than that most common today. The details must be worked out. Certain entities that do not now exist need to be bootstrapped and perhaps subsidized. There must be a complete ecosystem, and as many aspects of a market economy as possible must be present.

Learning from civil engineering

My friend Peter Levin pointed out to me that the analogy between software engineering and civil engineering (the building of bridges, dams, and sewers) should be used to help flesh out a potential structure of the ecosystem. Here are some more thoughts inspired by that:

Architects, civil engineers, and contractors as part of their training learn a set curriculum, pass tests, and are often licensed. They are supposed to share a body of knowledge and experience and demonstrate competence. What thrust should be part of the training of software engineers? For years we emphasized execution speed, memory constraints, data organization, flashy graphics, and algorithms for accomplishing this all. Curricula needs to also emphasize robustness, testing, maintainability, ease of replacement, security, and verifiability.

Standards bodies publish best practices (how high should a railing be above the stair tread, how thick should a concrete footing be under a supporting pillar, etc.). Even though a project might be novel (such as a new bridge or Boston's Big Dig), there are many standards that can (and must) be applied. By standards here we mean a conservative approach that is intended to minimize error, increase security, and lower maintenance costs, not just facilitate data interchange. Like all engineering, new software, as we know, commits old errors. We need to teach the right "war stories".

Physical projects are subject to inspection by standards bodies. When you have electrical or plumbing work done, the town inspector comes to check the work before the job can be considered finished. Transparent societal infrastructural software needs inspection. This will raise the role of independent testing entities. There is much talk about such roles in the discussion about electronic voting and gambling machines, but it is also important for the software we are covering here. These jobs -- part QA, part auditor, part private investigator -- can be very high status because of the range and depth of knowledge and experience needed. For public projects, the transparency of open source is needed to allow multiple, independent inspections. There are also different inspection specialties, including standards compliance, security and other stressing, maintainability, and functionality.

When physical projects fail (a suspension bridge twists in heavy winds, an elevated freeway falls down in an earthquake, an airplane crashes) public inquiries are performed, reports are published, and fixes are designed and retrofitted to existing projects. What we learn from failures enter the standards lexicon and is used for training and new design. We don't do this yet in the world of software. Access to the source code, the right to discuss it in detail, and the ability to search for similar code elsewhere is crucial to many such studies.

Further thoughts

The heart of this is some sort of open source software. The exact license requirements are not yet clear, and will probably vary depending upon the project. The depth of thinking that went into the GNU General Public License is needed, and it is a good start.

The role of open source software scares many traditional software developers. There is an image of a need for volunteer labor, and developers not getting paid. This is far from the case. Developers and companies are still needed, and the best will be in high demand and well paid. Criteria of what is "good" may change -- the ability to write clear, robust, maintainable code with an eye to the future, or do clean modifications, or explain how to use old software in new contexts, will become even more important. Documentation, training, servicing, testing, and more will still be paid for. In fact, the knowledge that such work has long-term consequences and may be amortized over longer periods of time raises their value. What does go away is the effort spent on making upgrading and replacement a desirable thing, both in development time and marketing dollars.

What about competition? There is nothing that says that there should only be one product for each application. Competition is very helpful for bringing out the best in product development. With that knowledge, funders should consider funding more than one project and keep all promising ones alive even if, as is the tendency with software, one comes to dominate in deployments.

What does this say about the size of the development entities? There is no special requirement. Some may be very big, some may be very small. Smaller entities (and projects) have a better chance than today, because in such an ecosystem they would not be evaluated based on their own ability to provide long-term support (a major impediment today), but rather on their products' characteristics for fitting into the ecosystem.

The structure of the development may be concentrated mainly all in one entity, much as with a product like MySQL or Adobe Photoshop. Alternatively, it may instead be coordinated by a strong center, but distributed among many players, such as with Linux. The key skills will include the ability to manage such projects in the ecosystem. There is probably a separation between managing the initial development and the long-term maintenance and monitoring.

Is this "socialized software", with the government making all decisions? No. While funding and management cooperatives seem a likely part of the ecosystem, there is no need for a single such entity; in fact, that would be bad. Developers with promising ideas can still use risk capital for initial development, and would still be able to find single customers to provide the funding. Also, some projects may be worth funding solely because they are synergistic with existing products that are being supported by existing entities. So, for example, a training and support company may help fund a product that lowers maintenance costs and that will need training.

Remember, this is only for one part of the software world -- that of social infrastructure software. There are many other uses of software, each with their own preferred ecosystem for development and support.

As part of this, buyers must get used to funding projects in advance. This is already the case in many areas, and the addition of cooperative funding with others can lower the costs or increase the scope of potential projects. Buyer funding lowers the requirement for potential "big hits" to incentivize development.

There is much talk about open source software in relation to existing software firms and lowering costs. What we are discussing here is opening up new types of firms, with huge potentials for revenues stemming from valuable services.

Open source essays often revolve around cost savings of acquisition and the use of volunteer labor for testing and maintenance. That is not the thrust here. In fact, the acquisition costs may actually be higher, and paid labor is assumed. The key is a model for long-term use, with a lowering of total cost of ownership, less disruption, and better integration. Open source discussion for government and business is often just in regards to existing open source applications, such as Linux and hoped-for desktop applications. There needs to be more discussion about projects of less general interest to the common software developer, such as EPA compliance monitoring systems, government record keeping, court workflow systems, and e-government components. Open source software discussion should be about keeping the trains running on time and not just saying it should run on Linux. The discussions should be about funding the companies needed in such an ecosystem and assuring their sources of healthy revenue. The code is not the only part of the equation, and leadership for all aspects of the ecosystem need to be addressed.

I hope that this essay is helpful to people that need to be involved in bringing about this needed ecosystem.

-Dan Bricklin, 14 July 2004

As a continuation of examining the area of long-term software, I've written another essay as part of a process of looking for design principles to follow. See "Learning From Accidents and a Terrorist Attack".

-Dan Bricklin, 7 September 2004

Related material:

Massachusetts Secretary of Administration and Finance Eric Kriss: "Open Mind on Open Source". History and rationale for sharing source in government.

Dan Bricklin's Log reports of meetings with Secretary Kriss: October 8, 2003 and January 12, 2004.

GNU Project: "Philosophy of the GNU Project". Links to essays about Free Software and free software licenses such as the GPL.

Peruvian letter about Open Source in government: Reactions from a Peruvian lawmaker to statements about open source submitted by a Microsoft representative.

Government Open Code Collaborative Repository: "...a voluntary collaboration between public sector entities and non-profit academic institutions created for the purpose of encouraging the sharing, at no cost, of computer code developed for and by government entities where the redistribution of this code is allowed." Includes Massachusetts, Rhode Island, Pennsylvania, Utah, Kansas, Missouri, West Virginia, and a variety of cities.

Avalanche Corporate Technology Cooperative: "...a private exchange that enables it's members to contribute, collaborate and legally distribute intellectual property to other members." Founded by Best Buy, Cargill, and others. Shares code and procedures among members. It is not open source.

Leopard Project of the Open Government Interoperability Project of the Open Source Software Institute: "The Open-Source Software Institute (OSSI) is a non-profit (501 c 6) organization comprised of corporate, government and academic representatives whose mission is to promote the development and implementation of open-source software solutions within U.S. Federal and State government agencies and academic entities." The Leopard Project is an eGovernment web services platform based on LAMP (Linux, Apache, MySQL, PHP/Perl/Python open source components).

Copy Protection Robs The Future: Dan Bricklin's essay about the long-term dangers of Digital Rights Management.

The New York Times article on why slot machines are more trustworthy than voting machines because of testing and enforcement: "Gambling on Voting".

Books about the role of failure in engineering:

Henry Petroski's "To Engineer is Human: The Role of Failure in Successful Design". This book, which has a picture of the Tacoma Narrows bridge collapsing and the Challenger in flight on its cover, discusses several well-known engineering failures. It goes into detail about how the failures were analyzed and what we can learn from them.

Another Petroski book: "Design Paradigms: Case Histories of Error and Judgment in Engineering". Petroski presents several general paradigms of error, such as errors in conceptual design, errors related to scale in size, errors in logic, success masking failure, and others. To quote from the Preface: "This book argues for a more pervasive use of historical case studies in the engineering curriculum."

Charles Perrow's "Normal Accidents: Living with High-Risk Technologies". This book emphasizes learning about failures through detailed study of many "accidents" and especially "near-misses" and the systems around them. Don't say "Whew!" and ignore "almosts", or say "well, it was an accident" -- learn from them both. There are about 5,000 people a year killed in U.S. industry. This book is covered in great depth in my "Learning From Accidents and a Terrorist Attack" essay.