Wednesday, May 11, 2016

Artificial Intelligence Coding Competition

http://theaigames.com/
https://www.quora.com/What-are-the-most-interesting-online-AI-competitions

Monday, May 9, 2016

Panopticon Disassembler

Panopticon is a disassembler that understands the semantics of opcodes. This way it's able to help the user by discovering and displaying invariants that would have to be discovered "by hand" in traditional disassemblers. This allows an interactive search through the space of all possible program executions.

Development is coordinated on Github https://github.com/das-labor/panopticon

Friday, December 25, 2015

A highly visual ARM simulator

Move on ke ARM: http://salmanarif.bitbucket.org/visual/index.html

Monday, October 26, 2015

Fuzzing with american fuzzy lop

http://lwn.net/Articles/657959/

Fuzzing with american fuzzy lop

September 22, 2015
This article was contributed by Hanno Böck

In September 2014 a serious security vulnerability that became known as Shellshock was found in Bash, which is the default shell in most Linux distributions. But it quickly turned out that the initial fix for Shellshock was incomplete. Various other related bugs were found only days after the publication, amongst them two severe vulnerabilities discovered by Michał Zalewski from the Google security team. In the blog post, Zalewski mentioned that he had found these bugs with a fuzzing tool that he wrote, which almost nobody knew back then: american fuzzy lop (afl). It was the beginning of a remarkable success story.
Fuzzing is an old technique to find bugs in software. The basic idea is simple: just feed an application lots of input data with randomly introduced errors in it and see if anything happens that would indicate a bug. The easiest thing to watch for is a crash. If an application crashes from invalid input, it is often a sign of an invalid memory access—a buffer overflow, for example. And these often are security vulnerabilities.

Between dumb and template-based fuzzers

In the past, most fuzzing tools fell into two categories: "dumb" fuzzers that take a valid input and only modify it randomly, and template-based fuzzing tools that are specific to a certain input data format. Both have their limitations. While dumb fuzzers can still be surprisingly successful, they will only find the most obvious bugs because they have no knowledge about the underlying data format. Template-based fuzzers can find more bugs, but creating them is a lot of work. They have to be adapted for every kind of input data they are applied to.
American fuzzy lop tries a new strategy. To use it, the first step is to recompile an application with a special compiler wrapper that adds assembly instrumentation code to the binary. Both Clang and GCC are supported. This instrumentation allows the fuzzer itself (afl-fuzz) to observe the code paths that a certain input file will trigger. If afl-fuzz finds an input sample that triggers a new code path, it uses that sample as a starting point for further fuzzing.
This strategy causes afl-fuzz to reach a high level of code coverage for its testing. At some point, the fuzzing process may transform an input file into one that has a certain rarely used feature enabled. The fuzzer will detect that a new code path has been triggered and the fuzzing process will create further input files that use that code path and may find bugs there. The big advantage is that the person using the fuzzing tool doesn't need to have any knowledge about this rarely used feature. The fuzzer will just find that out for itself.

Magically creating JPEG headers

An experiment by Zalewski shows how remarkably successful this strategy can be. He used a bogus file to start the fuzzing process on the djpeg tool that comes with libjpeg. After a while the fuzzer automatically created input files that contained a valid JPEG header. This does not mean that it's advisable to start a fuzzing process with bogus files. It saves time to start the fuzzing process with a valid input file. In addition, this will likely not work in all situations. For example, the fuzzer is unable to create large string sequences by chance.
American fuzzy lop has been responsible for the discovery of bugs and security vulnerabilities in many important free software package, including security-sensitive software like OpenSSL, GnuPG, and OpenSSH. Some of the vulnerabilities in the Android media framework, Stagefright, that were recently discovered by Joshua Drake [PDF] were attributed to it. In order to do this, Drake had to port Stagefright to Linux, since it was designed to only run on Android. More recently, two denial-of-service issues (CVE-2015-5722 and CVE-2015-5477), which would allow attackers to remotely crash DNS servers using BIND, were also found by american fuzzy lop.
Apart from finding new bugs, some people have experimented to see if american fuzzy lop would have been able to find certain already known bugs that were believed to be hard to discover. I was able to rediscover the Heartbleed bug by fuzzing the handshake of an old, vulnerable version of OpenSSL. One notable aspect of the bug was that it involved the heartbeat extension of TLS, which is a feature that almost nobody knew about before Heartbleed hit the news. Codenomicon, the company that found Heartbleed, also used a fuzzing tool, but their fuzzer had prior knowledge of the heartbeat extension and specifically targeted it with bogus inputs. In my experiment, I used no knowledge about this specific extension. American fuzzy lop was able to generate the correct handshake packet that would contain the extension, together with the data that would trigger Heartbleed, within six hours.
Another example of a hindsight finding was a bug in the OpenSSL function BN_sqr(), which is used to square large numbers (bignums). With some inputs, it would produce wrong results due to an error in the carry propagation. These inputs were rare corner cases—only one out of 2^128 input numbers would trigger that bug. Testing with random inputs would never have led to the discovery of such a bug. However, Ralf-Philipp Weinmann was able to rediscover this bug with the help of american fuzzy lop. In a talk given at the Black Hat conference in Las Vegas, he presented a small testing application that would compare the output of the squaring with the result of multiplying a number by itself. By definition these two calculations should produce the same output, so the application would give an assertion error if the results differed. Using that test program to recognize when BN_sqr() failed, american fuzzy lop was able to find an input that triggered the bug within an hour.
Although interesting, these results should be interpreted with caution. It is obviously easy to find a bug if one knows where to look. But they still indicate that even bugs that seem hard to find without code analysis may be susceptible to fuzz testing.

Easy to use and free software

Its novel fuzzing strategy is not the only reason for the success of american fuzzy lop. Two factors that likely play a major role are that the code is openly available under a free license—it uses the Apache 2 license—and that the tool is relatively easy to use. This separates it from many other tools in the security area. Lots of commercial tools only target a limited audience, because they are expensive or not available to the public at all. And IT security tools from the academic world often have the reputation of being hard to install and even harder to use without a background in the corresponding field.
Zalewski puts a high value on the usability of his tool and often implements recommendations from its users. While the first versions were very cumbersome to use, this has changed dramatically. Also, by now packages for it are available for most Linux distributions and BSD systems. A package for Mac OS X is also available. Currently, there is no version of american fuzzy lop for Windows or for Android.
To use american fuzzy lop, one first needs to recompile an application with the compiler wrapper shipped with afl (afl-gcc/afl-g++ or afl-clang/afl-clang++). The fuzzer needs a command-line tool that takes an input file. In most cases, libraries ship some small tools that allow parsing input files and should be suitable.
When compiling libraries, it's often advisable to statically link the library against the shipped tools, which will avoid having to do library preloading when running the executable. With software using the GNU autotools, this can usually be achieved with the configure parameter --disable-shared. Therefore the configure call should look something like this:

    ./configure --disable-shared CC=afl-gcc CXX=afl-g++

Next, the user needs one or more sample input files. It is usually advisable to create small input files; if possible, they shouldn't be larger than a few kilobytes. These need to be put into a directory. Then start the fuzzing:

    afl-fuzz -i [input_directory] -o [output_directory] ./[executable_to_fuzz] @@

The command afl-fuzz will replace @@ with the fuzzed inputs. If there is no @@ in the command line, the input will be passed via the standard input (stdin).

AddressSanitizer finds more memory access bugs

There are many more options and variants in how to use afl. One notable feature is the ability to combine the fuzzing with the use of a compiler feature called AddressSanitizer (ASan), which is part of the GCC and Clang compilers and can be enabled with the parameter -fsanitize=address. It adds detection code for invalid memory accesses to an executable. Many memory access bugs, such as out-of-bounds reads or use-after-free errors, often don't cause an application to crash. With normal fuzzing, they would pass by unnoticed. AddressSanitizer changes that and stops the application execution on every read or write to an invalid memory location. In american fuzzy lop, the use of AddressSanitizer can be enabled by setting the AFL_USE_ASAN environment variable to 1.
There are, however, some caveats. While AddressSanitizer is remarkably fast compared to other similar tools like Valgrind, it still slows down the execution significantly. It is therefore sometimes suggested to run the fuzzing process without it. The queue directory generated by afl-fuzz contains all the input samples it considered interesting because they triggered new code paths. These can then be manually tested with an ASan-compiled version of the software.
AddressSanitizer allocates a huge amount of virtual memory—on 64-bit systems, several terabytes. As it is only virtual memory and only small parts of it are actually used, this doesn't stop it from working. But american fuzzy lop limits the memory for its tested applications. One easy way to work around this is to disable the memory limit of afl-fuzz (using the parameter -m none). In rare cases, this could lead to system instabilities because some inputs may cause an application to use large amounts of memory. American fuzzy lop also ships a more elegant solution that limits the memory via control groups (cgroups).

Network fuzzing is a challenge

The biggest limitation of american fuzzy lop is that it only supports file inputs. In many cases, the most interesting pieces of code from a security perspective are the parsers for networking functions. Sometimes this limitation can be worked around. In the Heartbleed experiment mentioned earlier, it was possible to create a small tool for the OpenSSL handshake that would take a raw network packet on the command line and swap it with parts of the real handshake. This was possible because the OpenSSL API allows doing a handshake, without any real network connection, just by passing buffers between different contexts. But in many other situations it is not that easy.
Some attempts have been made to intercept networking functions by preloading a library with LD_PRELOAD that would then use a file input and pass it as networking input to an application. One such attempt, called Preeny, is publicly available. However, these attempts turned out to be relatively fragile and only work on a small fraction of real-world applications. Combining american fuzzy lop with network fuzzing in a way that works on a wide variety of applications is still an open challenge.
An active community has built a number of extensions and additional tools that can be used alongside american fuzzy lop. There are variants for Python, Rust, and Go, a special QEMU mode that allows fuzzing binary applications on Linux without having access to the source code, and many more. The developers of the LLVM compiler framework have implemented a special fuzzing mode for libraries that borrows several ideas from american fuzzy lop.
The origin of the name american fuzzy lop—a rabbit breed—can be traced back to an earlier tool Zalewski wrote: Bunny the Fuzzer. "Bunny wasn't particularly great, but when I decided to revisit the idea many years later, I felt that it's appropriate to allude to that in some way", Zalewski wrote in a post to the afl-users mailing list.
American fuzzy lop has helped redefine and revive the technique of fuzzing. In combination with AddressSanitizer, it is a powerful method to improve the quality of software and to find a lot of hidden and otherwise hard-to-find bugs. It has its limitations, but the biggest limitation is probably that it is not used widely enough yet. Almost every piece of software written in C or C++ that takes input from potentially dangerous sources can and should be tested using american fuzzy lop.
[The author started The Fuzzing Project last year, in an effort to fuzz as many free software applications as possible. His work for The Fuzzing Project is supported by the Linux Foundation's Core Infrastructure Initiative.]

Thursday, October 22, 2015

A small British firm shows that software bugs aren't inevitable

The Exterminators
A small British firm shows that software bugs aren't inevitable

Peter Amey was an aeronautical engineer serving in the United Kingdom's Royal Air Force in the early 1980s when he found a serious flaw in an aircraft missile-control system being deployed at the time. It wasn't a defect in any of the thousands of mechanical and electronic parts that constituted the system's hardware. The problem was in the system's software. Amey found an erroneous piece of program code--a bug [see photo, "Checking Code"]. Because of it, the unthinkable could happen: under rare circumstances, a missile could fire without anyone's having commanded it to do so.
Amey says his superiors, rather than commending his discovery, complained that it would delay the system's deployment. Like most project managers, they didn't like the idea of fixing errors at the end of the development process. After all, good design ought to keep errors out in the first place. Yet time and again, Amey knew, the software development process didn't prevent bugs; it merely put off dealing with them until the end. Did it have to be that way? Or could developers avoid bugs in the first place? He would find the answer to be "yes" when, years later, he joined Praxis High Integrity Systems [see photo, " Bug Killer"].
Praxis, headquartered in Bath, 2 hours from London by car, was founded in 1983 by a group of software experts who firmly believed they could put together a sound methodology to ruthlessly exterminate bugs during all stages of a software project.
At the time, the software world was in a malaise that it hasn't fully shaken even today [see "Why Software Fails," in this issue]. Software projects were getting larger and more complex, and as many as 70 percent of them, by some estimates, were running into trouble: going over budget, missing deadlines, or collapsing completely. Even projects considered successful were sometimes delivering software without all the features that had been promised or with too many errors--errors that, as in the missile-firing system, were sometimes extremely serious. The personal computer era, then just starting, only reinforced a development routine of "compile first, debug later."
Praxis armed itself not only with an arsenal of the latest software engineering methods but also with something a little more unusual in the field: mathematical logic. The company is one of the foremost software houses to use mathematically based techniques, known as formal methods, to develop software.
Basically, formal methods require that programmers begin their work not by writing code but rather by stringing together special symbols that represent the program's logic. Like a mathematical theorem, these symbol strings can be checked to verify that they form logically correct statements. Once the programmer has checked that the program doesn't have logical flaws, it's a relatively simple matter to convert those symbols into programming code. It's a way to eliminate bugs even before you start writing the actual program.
Praxis doesn't claim it can make bug-free software, says Amey, now the company's chief technical officer. But he says the methodology pays off. Bugs are notoriously hard to count, and estimates of how common they are vary hugely. With an average of less than one error in every 10 000 lines of delivered code, however, Praxis claims a bug rate that is at least 50--and possibly as much as 1000--times better than the industry standard.
Praxis is still a small, lonely asteroid compared to the Jupiter-size companies that dominate the software universe--companies like Microsoft, Oracle, and SAP. The tiny British software house doesn't make products for the masses; it focuses on complex, custom systems that need to be highly reliable. Such mission-critical systems are used to control military systems, industrial processes, and financial applications, among other things.
Sometimes the software needs to work 99.999 percent of the time, like an air-traffic control program Praxis delivered some years ago. Sometimes it needs to be really, really secure, like the one Praxis recently developed for the National Security Agency, the supersecret U.S. signals intelligence and cryptographic agency, in Fort Meade, Md.
And though Praxis employs just 100 people, its name has become surprisingly well known. "They're very, very talented, with a very different approach," says John C. Knight, a professor of computer science at the University of Virginia and the editor in chief of IEEE Transactions on Software Engineering. Praxis's founders, he says, believed that building software wasn't as hard as people made it out to be. "They thought, it isn't rocket science, just very careful engineering."
Watts S. Humphrey, who once ran software development at IBM and is now a fellow at the Software Engineering Institute at Carnegie Mellon University, in Pittsburgh, also speaks highly of Praxis. He says the company's methodology incorporates things like quality control that should be more widely used in the field. In fact, Humphrey spent this past summer at Praxis headquarters to learn how they do things. He wants to use that knowledge to improve a complementary methodology he developed to help organizations better manage their software projects.
Praxis's approach, however, isn't perfect and isn't for everybody. Formal methods obviously are no silver bullet. For one thing, using formal methods can take more time and require new skills, all of which can mean higher up-front costs for a client. In fact, Praxis charges more--50 percent more in some cases--than the standard daily rate. To this its engineers will say: "You get what you pay for; our bug rate speaks for itself."
And although formal methods have been used to great effect in small and medium-size projects, no one has yet managed to apply them to large ones. There's some reason to think no one ever will, except perhaps in a limited fashion. Nevertheless, even though the company may not have all the answers to make software projects more successful, those working in the field can learn plenty of lessons from it, say advocates like Knight and Humphrey.

Software was conceived as a mathematical artifact in the early days of modern computing, when British mathematician Alan Turing formalized the concept of algorithm and computation by means of his now famous Turing Machine, which boils the idea of a computer down to an idealized device that steps though logical states.
But over time, software development gradually became more of a craft than a science. Forget the abstractions and the mathematical philosophers. Enter the realm of fearless, caffeinated programmers who can churn out hundreds of lines of code a day (often by hastily gluing together existing pieces of code). The problem is that for some projects, even tirelessness, dedication, and skill aren't good enough if the strategy is wrong.
Large, complex software systems usually involve so many modules that dealing with them all can overwhelm a team following an insufficiently structured approach. That's especially true of the mission-critical applications Praxis develops, as well as of large enterprise resource-planning systems, of the sort used by Fortune 500 companies, and complex data-driven software, such as the FBI's Virtual Case File project [see "/sep05/1203 Who Killed the Virtual Case File?" in this issue].
Even when you break such a big program down into small, seemingly manageable pieces, making a change to one turns out to affect 10 others, which may in turn affect scores or maybe even hundreds of other pieces. It may happen that making all the fixes will require more time and money than you have. If your system's correct operation depends on those changes, you'll have to either admit defeat or scramble to find a way to salvage what you've done so far, perhaps by giving up on some of the capabilities or features you'd hoped to have in the software.
As it turns out, complete failure--projects canceled before completion--is the fate of 18 percent of all information technology projects surveyed in a 2004 study by consultancy Standish Group International Inc., in West Yarmouth, Mass. Apparently that's the good news; the rate 10 years ago, according to Standish, was 31 percent.
Still, the overall picture is pretty bleak. Standish asserts that more than 50 percent of the thousands of projects it surveyed faced problems, from being turned over without significant features to going well beyond their deadlines or budgets. In the end, according to the Standish numbers, only 28 percent of projects could be considered successes by any rigorous definition.
Standish's numbers, however, are far from universally accepted in the computer industry. For contract software projects, more specifically, other analyses in recent years have put the success rate as low as 16 percent and as high as 62 percent. Nevertheless, even using those numbers as a guide, it's hard not to see the contract software business as anything but an enterprise all too often mired in mediocrity. As one study by consultant Capers Jones, in Marlborough, Mass., put it: "Large software systems...have one of the highest failure rates of any manufactured object in human history."
Today, ever more sophisticated tools are available to help companies manage all aspects of their software projects. These tools help conceptualize and design the system; manage all people, files, computers, and documents involved; keep track of all versions and changes made to the system and its modules; and automate a number of tests that can be used to find system errors.
Indeed, worldwide sales of such software development tools, according to Stamford, Conn.based market research firm Gartner Inc., generate more than US $3 billion a year. Rational Software Corp., a company acquired by IBM in 2002 for $2.1 billion, is currently the market leader, followed by Microsoft, Computer Associates International, Compuware, Borland, and others, according to Gartner.
But the effect of widespread use of these tools on overall software quality hasn't been gauged in a detailed or rigorous way. Some would even argue that the sector is a little reminiscent of the market for diet products: it, too, is a billion-dollar industry, and yet, somehow, obesity as a public health problem hasn't gone away. And, just as the few successful diet strategies all seem to require a major change in lifestyle, perhaps, too, the software failure rates won't improve significantly without a basic and widespread shift in tactics.
Certainly, Praxis's experience supports that idea. Consider one of the company's recent projects, for Mondex International Inc., a financial services company founded in the UK that is now a subsidiary of MasterCard International Inc. First, a little background. Mondex had a product called an electronic purse, a credit cardlike payment card that stored electronic cash. That is, it did not debit a bank account or draw on a line of credit; it stored the cash digitally in an embedded chip. Mondex wanted to make the card flexible enough to run a variety of applications that would keep track not only of electronic cash but also of discount coupons, loyalty reward points, and other items still unimagined.
The critical issue was to make sure that only cards with legitimate applications would work; any other card, even if programmed to pass as a Mondex card, would be deemed invalid. The solution Mondex chose was to use a special program, known as a certification authority, that would run on a central computer at the company's headquarters. The certification authority would generate unique digital certificates--long strings of numbers--to accompany all applications on the cards. That way, a card reader at, say, a store could validate a card's certificates by running them through a series of mathematical operations that would prove unequivocally that they came from Mondex.
Mondex hired Praxis to develop the certification authority, which was the most critical part of the whole system. After all, if the security of one card was broken, then just that one card could be forged. But compromising the certification authority would allow mass forgery of cards.

The Praxis team began working on the solution in late 1998. The first step was to hammer out what, precisely, the Mondex system was supposed to do--in software jargon, the system's requirements. These are essentially English-language bullet points that detail everything the program will do but not how it will be done.
Getting the requirements right is perhaps the most critical part of Praxis's methodology. For that reason, Praxis engineers held many exhaustive meetings with the people from Mondex, during which they tried to imagine all possible scenarios of what could happen. As Praxis does for all its projects, it insisted that Mondex make available not only its IT people but everyone who would have any involvement with the product--salespeople, accountants, senior managers, and perhaps even the CEO. "We focus very hard on identifying all stakeholders, everybody that cares," says Roderick Chapman, a principal engineer at Praxis [see photo, " Spark Maker"].
To make sure Praxis was on target with the system requirements, it devised a prototype program that simulated the graphical interface of the proposed system. This prototype had no real system underlying it; data and commands entered through the interface served only to check the requirements. In fact, Praxis made no further use of the prototype--the real graphical interface would be developed later, using much more rigorous methods. In following this approach, Praxis was complying with an edict from Frederick P. Brooks's 1975 classic study of software development, The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley, 2nd edition, 1995):
In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done....
Hence plan to throw one away; you will, anyhow.
Once Praxis's engineers had a general idea of what the system would do, they began to describe it in great detail, in pages and pages of specifications. For example, if a requirement said that every user's action on the system should produce an audit report, then the corresponding specification would flesh out what data should be logged, how the information should be formatted, and so on.
This is the first math-intensive phase, because the specifications are written mostly in a special language called Z (pronounced the British way: "zed"). It's not a programming language--it doesn't tell a computer how to do something--but it is a formal specification language that expresses notions in ways that can be subjected to proof. Its purpose is simple: to detect ambiguities and inconsistencies. This forces engineers to resolve the problems right then and there, before the problems are built into the system.
Z, which was principally designed at the University of Oxford, in England, in the late 1970s and early 1980s, is based on set theory and predicate logic. Once translated into Z, a program's validity can be reviewed by eye or put through theorem-proving software tools. The goal is to spot bugs as soon as possible [see sidebar, " "].

The process is time-consuming. For the Mondex project, spec-writing took nearly a year, or about 25 percent of the entire development process. That was a long time to go without producing anything that looks like a payoff, concedes Andrew Calvert, Mondex's information technology liaison for the project. "Senior management would say: 'We are 20 percent into the project and we're getting nothing. Why aren't we seeing code? Why aren't we seeing implementation?' " he recalls. "I had to explain that we were investing much more than usual in the initial analysis, and that we wouldn't see anything until 50 percent of the way through." For comparison, in most projects, programmers start writing code before the quarter-way mark.
Only after Praxis's engineers are sure that they have logically correct specifications written in Z do they start turning the statements into actual computer code. The programming language they used in this case, called Spark, was also selected for its precision. Spark, based on Ada, a programming language created in the 1970s and backed by the U.S. Department of Defense, was designed by Praxis to eliminate all expressions, functions, and notations that can make a program behave unpredictably.
By contrast, many common programming languages suffer from ambiguity. Take, for example, the programming language C and the expression "i++ * i++," in which "*" denotes a multiplication and "++" means you should increment the variable "i" by 1. It's not an expression a programmer would normally use; yet it serves to illustrate the problem. Suppose "i" equals 7. What's the value of the expression? Answer: it is not possible to know. Different compilers--the special programs that transform source code into instructions that microprocessors can understand--would interpret the expression in different ways. Some would do the multiplication before incrementing either "i," giving 49 as the answer. Others would increment the first "i" only and then do the multiplication, giving 56 as the answer. Yet others would do unexpected things.
Such a problem could not happen in Spark, says Praxis's Chapman, because all such ambiguous cases were considered--and eliminated--when the language was created. Coding with Spark thus helps Praxis achieve reduced bug rates. In fact, once Spark code has been written, Chapman says, it has the uncanny tendency to work the first time, just as you wanted. "Our defect rate with Spark is at least 10 times, sometimes 100 times lower than those created with other languages," he says.
Peter Amey explains that the two-step translation--from English to Z and from Z to Spark--lets engineers keep everything in mind. "You can't reason across the semantic gap between English and code," he says, "but the gap from English to an unambiguous mathematical language is smaller, as is the gap from that language to code."
What's more, Spark lets engineers analyze certain properties of a program--the way data flows through the program's variables, for example--without actually having to compile and run it. Such a technique, called static analysis, often lets them prevent two serious software errors: using uninitialized variables, which may inject spurious values into the program, and allocating data to a memory area that is too small, a problem known as buffer overflow.
In practice, though, not everything can be put through the mathematical wringer. Problems with the way different modules exchange data, for instance, by and large have to be solved the old-fashioned way: by thinking. Nor can Praxis completely eliminate classic trial-and-error testing, in which the programmers try to simulate every situation the software is likely to confront.
But what Praxis does do is make such simulation a last resort, instead of the main line of defense against bugs. (As famed computer scientist Edsger Dijkstra wrote, "Program testing can be used to show the presence of bugs, but never to show their absence!") For the Mondex project, such testing took up 34 percent of the contract time. That's in the lower end of the usual share, which typically ranges from 30 to 90 percent. Reduced efforts on testing mean huge savings that go a long way toward balancing the extra time spent on the initial analysis.
The system went live in 1999. Though it cost more up front, the contract called for Praxis to fix for free any problem--that is, any deviation from the specification--that came up in the first year of operation, a guarantee rarely offered in the world of contract software. That first year, just four defects triggered the clause. According to Chapman, three of the problems were so trivial that they took no more than a few hours to correct. Only one was functionally significant; it took two days to fix. With about 100 000 lines of code, that's an average of 0.04 faults per 1000 lines. Fault rates for projects not using formal methods, by some estimates, can vary from 2 to 10 per 1000 lines of code, and sometimes more.
For Mondex, fewer bugs meant saving money. Calvert estimates that Mondex will spend 20 to 25 percent less than the norm in maintenance costs over the lifetime of the project.

Formal methods were relatively new when Praxis started using them, and after some ups and downs, they have recently been gaining popularity. Among their leading proponents are John Rushby at SRI International, Menlo Park, Calif.; Constance Heitmeyer, at the U.S. Naval Research Laboratory's Center for High Assurance Computer Systems, Washington, D.C.; Jonathan Bowen at London South Bank University; the developers of Z at the University of Oxford and other institutions; and the supporters of other specification languages, such as B, VDM, Larch, Specware, and Promela.
In recent years, even Microsoft has used formal methods, applying them to develop small applications, such as a bug-finding tool used in-house and also a theorem-proving "driver verifier," which makes sure device drivers run properly under Windows.
But still, the perceived difficulty of formal tools repels the rank-and-file programmer. After all, coders don't want to solve logical problems with the help of set theory and predicate logic. They want to, well, code. "Few people, even among those who complete computer science degrees, are skilled in those branches of pure mathematics," says Bernard Cohen, a professor in the department of computing at City University, in London.
In every other branch of engineering, he insists, practitioners master difficult mathematical notations. "Ask any professional engineer if he could do the job without math, and you'll get a very rude reply," Cohen says. But in programming, he adds, the emphasis has often been to ship it and let the customer find the bugs.
Until formal methods become easier to use, Cohen says, Praxis and companies like it will continue to rely on clients' "self-selection"--only those users who are highly motivated to get rock-solid software will beat a path to their door. Those that need software to handle functions critical to life, limb, national security, or the survival of a company will self-select; so will those that are contractually obligated to meet software requirements set by some regulator. That's the case with many military contractors that now need to demonstrate their use of formal methodologies to government purchasers; the same goes for financial institutions. Mondex, for instance, required the approval of the Bank of England, in London, and formal methods were part of that approval.
Yet even if regulators were omnipresent, not all problems would be amenable to formal methods, at least not to those that are available now. First, there is the problem of scaling. The largest system Praxis has ever built had 200 000 lines of code. For comparison, Microsoft Windows XP has around 40 million, and some Linux versions have more than 200 million. And that's nothing compared with the monster programs that process tax returns for the U.S. Internal Revenue Service or manage a large telecom company's infrastructure. Such systems can total hundreds of millions of lines of code.
What does Praxis say about that? "The simple answer is, we've never gone that big," says Chapman. "We believe these methods should scale, but we have no evidence that they won't or that they will." So what if a client approaches Praxis with a really big project? Would the company handle it? "The key weapon is abstraction," he says. "If you can build abstractions well enough, you should be able to break things down into bits you can handle." That maxim guides every other discipline in engineering, not least the design of computer hardware. Why not apply it to software, too?

About the Author

Philip E. Ross (IEEE Member) wrote "Managing Care Through the Air" for the December 2004 issue of IEEE Spectrum. His work has also appeared in Scientific American, Forbes, and Red Herring.

Tuesday, October 13, 2015

Her Code Got Humans on the Moon—And Invented Software Itself

http://www.wired.com/2015/10/margaret-hamilton-nasa-apollo/

Margaret Hamilton wasn’t supposed to invent the modern concept of software and land men on the moon. It was 1960, not a time when women were encouraged to seek out high-powered technical work. Hamilton, a 24-year-old with an undergrad degree in mathematics, had gotten a job as a programmer at MIT, and the plan was for her to support her husband through his three-year stint at Harvard Law. After that, it would be her turn—she wanted a graduate degree in math.
But the Apollo space program came along. And Hamilton stayed in the lab to lead an epic feat of engineering that would help change the future of what was humanly—and digitally—possible.
As a working mother in the 1960s, Hamilton was unusual; but as a spaceship programmer, Hamilton was positively radical. Hamilton would bring her daughter Lauren by the lab on weekends and evenings. While 4-year-old Lauren slept on the floor of the office overlooking the Charles River, her mother programmed away, creating routines that would ultimately be added to the Apollo’s command module computer.
“People used to say to me, ‘How can you leave your daughter? How can you do this?’” Hamilton remembers. But she loved the arcane novelty of her job. She liked the camaraderie—the after-work drinks at the MIT faculty club; the geek jokes, like saying she was “going to branch left minus” around the hallway. Outsiders didn’t have a clue. But at the lab, she says, “I was one of the guys.”
Then, as now, “the guys” dominated tech and engineering. Like female coders in today’s diversity-challenged tech industry, Hamilton was an outlier. It might surprise today’s software makers that one of the founding fathers of their boys’ club was, in fact, a mother—and that should give them pause as they consider why the gender inequality of the Mad Men era persists to this day.
‘When I first got into it, nobody knew what it was that we were doing. It was like the Wild West.’ — Margaret Hamilton
As Hamilton’s career got under way, the software world was on the verge of a giant leap, thanks to the Apollo program launched by John F. Kennedy in 1961. At the MIT Instrumentation Lab where Hamilton worked, she and her colleagues were inventing core ideas in computer programming as they wrote the code for the world’s first portable computer. She became an expert in systems programming and won important technical arguments. “When I first got into it, nobody knew what it was that we were doing. It was like the Wild West. There was no course in it. They didn’t teach it,” Hamilton says.
This was a decade before Microsoft and nearly 50 years before Marc Andreessen would observe that software is, in fact, “eating the world.” The world didn’t think much at all about software back in the early Apollo days. The original document laying out the engineering requirements of the Apollo mission didn’t even mention the word software, MIT aeronautics professor David Mindell writes in his book Digital Apollo. “Software was not included in the schedule, and it was not included in the budget.” Not at first, anyhow.

But as the Apollo project unfolded, the centrality of software in accomplishing the mission started to become clear. In 1965, Hamilton became responsible for the onboard flight software on the Apollo computers. It was an exciting time, and the US was depending on the work that she was doing. But sometimes the pressure kept Hamilton up at night. Once, after a late-night party, she rushed back to the computer lab to correct a piece of code she’d suddenly realized was flawed. “I was always imagining headlines in the newspapers, and they would point back to how it happened, and it would point back

By mid-1968, more than 400 people were working on Apollo’s software, because software was how the US was going to win the race to the moon. As it turned out, of course, software was going to help the world do so much more. As Hamilton and her colleagues were programming the Apollo spacecraft, they were also hatching what would become a $400 billion industry.
For Hamilton, programming meant punching holes in stacks of punch cards, which would be processed overnight in batches on a giant Honeywell mainframe computer that simulated the Apollo lander’s work. “We had to simulate everything before it flew,” Hamilton remembers. Once the code was solid, it would be shipped off to a nearby Raytheon facility where a group of women, expert seamstresses known to the Apollo program as the “Little Old Ladies,” threaded copper wires through magnetic rings (a wire going through a core was a 1; a wire going around the core was a 0). Forget about RAM or disk drives; on Apollo, memory was literally hardwired and very nearly indestructible.
Apollo flights carried two near-identical machines: one used in the lunar module—the Eagle that landed on the moon—and the other for the command module that carried the astronauts to and from Earth. These 70-pound Apollo computers were portable computers unlike any other. Conceived by MIT engineers such as Hal Laning and Hamilton’s boss, Dick Batton, it was one of the first important computers to use integrated circuits rather than transistors. As Mindell tells the story, it was the first computerized onboard navigation system designed to be operated by humans but with “fly-by-wire” autopilot technology—a precursor to the computerized navigation systems that are now standard on jetliners.
The system stored more than 12,000 “words” in its permanent memory—the copper “ropes” threaded by the Raytheon workers—and had 1,024 words in its temporary, erasable memory. “It was the first time that an important computer had been in a spacecraft and given a lot of responsibility for the mission,” says Don Eyles, who worked on the lunar module code while at MIT’s IL. “We showed that that could be done. We did it in what today seems an incredibly small amount of memory and very slow computation speed.” Without it, Neil Armstrong wouldn’t have made it to the moon. And without the software written by Hamilton, Eyles, and the team of MIT engineers, the computer would have been a dud.
This became clear on July 20, 1969, just minutes before Apollo 11 touched down on the Sea of Tranquility. Because of what Apollo software engineer Don Eyles has termed a “documentation error,” the Apollo computer started spitting out worrying error messages during this critical phase of the mission. But here’s where the technical arguments won by Hamilton and others saved the day. The error messages were popping up because the computer was being overwhelmed, tasked with doing a series of unnecessary calculations when, in fact, it was most needed to land the module on the surface of the moon. Back in Houston, engineers knew that because of Apollo’s unique asynchronous processing, the computer would focus on the task at hand—landing the Eagle on the Sea of Tranquility. When the software realized it didn’t have enough room to do all the functions it was supposed to be doing, it went through its error detection process and focused on the highest priority job, Hamilton says.

‘That would never happen’

One day, Lauren was playing with the MIT command module simulator’s display-and-keyboard unit, nicknamed the DSKY (dis-key). As she toyed with the keyboard, an error message popped up. Lauren had crashed the simulator by somehow launching a prelaunch program called P01 while the simulator was in midflight. There was no reason an astronaut would ever do this, but nonetheless, Hamilton wanted to add code to prevent the crash. That idea was overruled by NASA. “We had been told many times that astronauts would not make any mistakes,” she says. “They were trained to be perfect.” So instead, Hamilton created a program note—an add-on to the program’s documentation that would be available to NASA engineers and the astronauts: “Do not select P01 during flight,” it said. Hamilton wanted to add error-checking code to the Apollo system that would prevent this from messing up the systems. But that seemed excessive to her higher-ups. “Everyone said, ‘That would never happen,’” Hamilton remembers.
But it did. Right around Christmas 1968—five days into the historic Apollo 8 flight, which brought astronauts to the moon for the first-ever manned orbit—the astronaut Jim Lovell inadvertently selected P01 during flight. Hamilton was in the second-floor conference room at the Instrumentation Laboratory when the call came in from Houston. Launching the P01 program had wiped out all the navigation data Lovell had been collecting. That was a problem. Without that data, the Apollo computer wouldn’t be able to figure out how to get the astronauts home. Hamilton and the MIT coders needed to come up with a fix; and it needed to be perfect. After spending nine hours poring through the 8-inch-thick program listing on the table in front of them, they had a plan. Houston would upload new navigational data. Everything was going to be OK. Thanks to Hamilton—and Lauren—the Apollo astronauts came home.
Also thanks to Hamilton and the work she led, notions of what humanity could do, and be, changed not just beyond the stratosphere but also here on the ground. Software engineering, a concept Hamilton pioneered, has found its way from the moon landing to nearly every human endeavor. By the 1970s, Hamilton had moved on from NASA and the Apollo program. She went on to found and lead multiple software companies. Today her company, Hamilton Technologies, is just a few blocks away from MIT, where her career began—a hub of the code revolution that’s still looking toward the stars.

Monday, October 12, 2015

Software Estimation is a Losing Game

https://rclayton.silvrback.com/software-estimation-is-a-losing-game

This is an argument, and like all arguments, it's supports a specific position. I don't try to defend software estimation; I leave it to my colleagues to present that case. I write about this issue because I believe the software community needs to have an internal debate about whether software estimation in its current form is appropriate for projects.

Software estimation is the process of predicting the time and effort required to accomplish development or maintenance tasks. Estimates are generally used to help customers (this could be your own organization) make decisions. Customers often want a general idea of how long it will take to accomplish tasks and since the customer is always right, we engineers have always agreed to participate in a practice that we frankly despise. We estimate because it's easier to sell a software solution with a fixed delivery time than to actually admit that the customer is simply going to get whatever product your team can complete in the allotted time. The estimate becomes our commitment to delivering a piece of software under some arbitrary deadline.
Sadly, we commit to these deadlines, despite the likelihood of missing them, and in full acknowledgement of the consequences of missing our estimates. These consequences include, but are not limited to:

Loss of credibility with your customer and possibly your own team.
Internal team friction, especially when management over-promises on a delivery.
Poor morale, particularly when the team has to work excessive hours to meet a delivery.
Low quality software if developers rush to meet deadlines and skip quality control practices like testing.
Project Failure when estimates are consistently off and the customer has completely lost faith in your team's ability to deliver.

Over the last six years, I've participated in a number of software projects where poor software estimation has led to very bad outcomes. As a result, I've spent many hours pondering, and discussing with friends, strategies that would have led these projects to greater success. Given the difficulty of estimating effort on a software project, not to mention the near-certain guarantee of getting it wrong, I've come to a radical question...

Do we need software estimation?

I've posed this question a couple of times to project managers and corporate types, particularly when I'm asked to provide some crazy estimate for a customer. "Well of course we do!" is typically the answer I get back. "A customer's not going to give us a contract without some guarantee of delivery." And the sad thing is, this response is fair. It's fair, because we have allowed it to be fair, even though we know this is not how software projects work.
In reality, software teams can only provide a set of capabilities proportional to the amount of time and effort we are allotted. In that sense, we are very much like the hospitality industry...we can't guarantee a guest they will be fully refreshed when they leave our resort, but we can assert that the longer the guest stays, the more likely they will achieve this outcome.

The cult of software estimation.

Despite our inherent understanding of the difficulties of software estimation, we continue to believe that it is the correct course of action for helping us plan and manage projects. As an industry, we've invented a host of methodologies to help us manage the uncertainty of estimation (planning poker, t-shirt sizes, etc.) and written countless books on the topic. In fact, many people have made software estimation itself a career, coaching teams and project managers on the art of predicting uncertainty. Yet, we are still just as incompetent at the process as we were three decades ago (maybe even more so).
The belief that software estimation is critical is so pervasive that it has been ingrained in project management philosophies like Agile, giving the impression that they are inseparable. This highlights what I call the “cult of software estimation”, which is literally that the belief that the importance of software estimation is above reproach [1].

The uncertainty of software estimation.

Software estimation is not an inherently bad process. The problem is, we're really bad at it. It's literally the equivalent of making a weather forecast. You wouldn't commit a forecaster to a prediction for rain a month from now (well, if you do, you're naive). Like any prediction, it's only really accurate, and useful, when making near-term forecasts.
There are a number of factors that lead to skewed software estimates. From my own experience, I've seen all of the following (but I'm sure there's more):

Tasks not granularly specified - a task is too ambiguous leading to a lot of thrashing on how it should be implemented.
Task creates new tasks for the same sprint - common with research tasks whose outcomes include some sort of solution to be built in the sprint.
Estimates made for unfamiliar technologies - trap for developers who inaccurately estimate the time to completion based on past experience with disparate platforms.
Working on unrelated tasks - i.e. context-switching; the tasks assigned to a developer are unrelated and force them to "spin up" for each one.
Tasks with in-sprint dependencies - developers become immediately blocked because precursor tasks are unfinished. This is especially problematic if the dependencies need to be finished by multiple people.
Research tasks with non-tangible outcomes - I've seen people take weeks to research/prototype some solution. If you don't constrain ambiguous tasks, you're inviting the task-taker to waste time.
One person estimating for another - common when developer is absent during planning and another does the estimation for them.
Peer pressure - engineer(s) press another to accomplish a task faster than they believe they can. This generally happens in high-stress environments when a stronger personality bullies a developer into working harder.
Poor morale: teams under a lot of pressure tend to deliver slower.
External pressure: pressure by management to estimate less/more time-to-completion because of political and financial agenda.

None of these factors should surprise you. In fact, our software estimation processes are designed to deal with them. Our solution is to apply psychology and mathematical models to develop better estimates. In fact, I feel like we waste more effort (per capita) than any other industry analyzing our own development practices seeking better efficiencies.

The resource cost of software estimation.

When you think about it, even for small teams, there is generally someone dedicated to studying things like "burn down" or normalizing software estimates by person, over time. Traditionally, software development teams are really small (2-10 people). So when you think about all the time and effort that one person has dedicated in studying the work habits of their team, it's possible you're losing a significant percentage of your resources to overhead.
More importantly, creating software estimates tends to be a group activity, typically occurring at the beginning of a sprint during some form of planning meeting. If you are dealing with a fairly complex system, or have a lot of people on the team, you'll spend anywhere from 2-8 hours planning a sprint. Consider the fact that you may be wasting entire man-days (number of people estimating * time spent estimating) of development time just coming up with estimates.
When you think about it, that's a lot of time and effort lost to come up with estimates that are going to generally be wrong.

What do we gain from software estimation?

A better appreciation for how much a team can accomplish in a set amount of time.
A roadmap that can be used to determine the time-to-completion of a project.
A nice dashboard with charts that can show individual developer accountability to management.

What do we lose from software estimation?

If the estimation is bad, the ability to maintain a realistic pace of development. You’re either burning team out or looking stupid because you finished too early.
Flexibility in planning and execution, particularly if you are planning too far out and commit to a set of tasks. What happens when your approach changes and you are going back to the customer to explain a task that was planned to take two weeks will now take 4 weeks?
Lost resources if you don’t have a significantly long project, the overhead of having someone study and normalize estimations will burden you.
Efficiency, particularly when estimation is combined with time-boxed development cycles like a sprint.

Software Estimation and Time-Boxed Development Cycles.

I would like to expand that last bullet. We typically use software estimation as a tool to help us plan our sprints. How else are we to know how many tasks we should take on in a given period of time? I contend that this practice really just promotes a host of inefficiencies in your development process:

Lost effort due to context-switching - Engineers only take as many tasks that fit within their window (say 2 weeks). This usually means the developer takes a couple of large tasks and several unrelated ones (hell they may all be unrelated). Working on unrelated tasks causes context-switching, which reduces overall efficiency since the developer has to “retool” for the new tasks.
Lost time due to fragmentation - Just like blocks in a hard drive, you can’t perfectly fit tasks into time slots (3-hour task into a half-day slot). There’s always going to be some amount of lost time. Of course, you could always reduce the “block size”, but is that even realistic (i.e. planning tasks down to the minute!)?
Discourages maximum productivity - What happens when you accomplish all of your tasks in a sprint? Some engineers may choose to pull more tasks off the backlog; others might gold plate. I’ll tell you what I might do…take a nap. Especially, if I just spent the last four months working my ass off to get a system up and running!

Why estimate?

So I ask again, why should we estimate? The only benefits rendered to our managers and customers is a prediction that really can't be trusted. The consequences, however, tend to range from reduced development efficiency, morale issues, low quality software, and potential loss of work.

What is the alternative to not estimating?

I don't have an exact methodology. Maybe it's something like Kanban, though I don't know enough about Kanban to say one way or the other. I can tell you what this paradigm might look like.
Let's call it, Budget-Driven Development, or BuDD for short. In BuDD, development teams are given a set of resources and develop what they can until the budget is gone. I’m sure I just gave someone a heart attack, but when you think about it, it’s not too unreasonable. Many contracts have fixed time and costs. You make a guarantee that the customer will get a specific number of man-hours of development. Given a prioritized backlog, developers work on the three most important items. I imagine it would work similar Pandora's prioritization model, but not be constrained by time.
You guarantee quality to your customer by:

Delivering capability often.
Allowing the customer to prioritize development.

Isn't that just Agile? - Yes! Without arbitrary estimates!
The benefit to the customer is that you’re making a guarantee that you will (hopefully) maximize their value by not delivering early and pocketing the rest of their investment. Your team will also discouraged from remaining idle because it will be immediately apparent by the substance of your deliveries.
Not estimating does not mean your are not managing your team. In fact, effective management (the kind that should be happening in the first place) becomes more important:

Team is constantly collaborating. Not estimating does not negate planning meetings, more sync’s or scrums, etc.
Backlog is managed. This includes managing dependencies and ordering and prioritizing tasks, etc.
Tasks are sized (as opposed to estimating delivery time). You should still be able to say, "this task is too big for one delivery."
Development cycles are scoped to tangible deliveries and you should favor smaller deliveries over larger ones.

Hopefully, I have convinced you (on some level) to rethink the use of software estimation. The next section discusses principals you can employ if you decide to move from estimation.

Moving towards a no-estimate methodology.

Whatever strategy you choose to employ to manage your team, I recommend at least adopting the following principals for managing your team:

Prefer granular specifications - The more granular the specification, the more accurate you can gauge the scope of a task. This means, don't settle on stories as a unit of work. Instead, break that story down into technical tasks. Use schema and contract specification technologies that explicitly define interfaces. Draw mockups. Essentially, remove the ambiguity from the task.
Choose task-oriented development cycles instead of time-based ones. This strategy minimizes context-switching and time fragmentation.
Group related tasks; prevent context switching and attempt to maximize developer flow.
Opt for smaller development cycles, particularly if more flexibility is needed. More importantly, this allows you to demonstrate your work more often and receive feedback.
Stagger your team’s responsibilities and efforts. Have the tech-lead/architect specify tasks a sprint ahead of when they are planned to be done. Test and QA should happen after a development cycle. Don't try to cram dependent tasks into the same cycle.
Choose the three most important tasks and work on them until completion. Don't waste your time with filler tasks just because you need to fill a time slot.
Stop and regroup - don’t be afraid to stop the sprint if you feel it's necessary. This might occur if you feel tasks aren’t specified well enough or if developers are thrashing (not getting work done efficiently). Take time to organize the team and then carry on.
Pivot - if things aren't working, or you feel like the team needs to go another direction, do so. The worst thing you can do in Agile development is not be agile.

Conclusion.

Whether you buy the argument that software estimation is unnecessary, or are still an ardently supporting the estimation camp, I think we can agree on a couple of ideas:

Software estimation is difficult.
Many teams fail to accurately make estimates.
Trying to meet unrealistic estimates can destroy a team's morale.
Failing to meet an estimate hurts a team's credibility.

The question is, if this is such a big problem, what should we do about it?

Comments, Notes, and Further Explanation

While you read this, you are probably thinking that I believe software estimation seriously flawed. I actually think software estimation can work. But to do it right, it takes a highly-cohesive team working in a well-known problem domain, with significant experience in the technologies being used. Even then, there's still the potential for estimation failure. For instance, think about the effects of holidays, vacation, and illness. What about significant weather or internal or external political issues. This might not happen often, but it can still cause problems. What I really believe is that it only works for about 5% of teams.

Richard Clayton –

Unrepentant Thoughts on Software and Management.