Sunday, July 31, 2016
Friday, July 29, 2016
"Reverse Engineering for Beginners" free book
http://beginners.re/#lite
"Reverse Engineering for Beginners" free book
Also known as RE4B. Written by Dennis Yurichev (yurichev.com).My services
Praise for the book
- Its very well done .. and for free .. amazing.' (Daniel Bilar, Siege Technologies, LLC.)
- ...excellent and free (Pete Finnigan, Oracle RDBMS security guru.).
- ... book is interesting, great job! (Michael Sikorski, author of Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software.)
- ... my compliments for the very nice tutorial! (Herbert Bos, full professor at the Vrije Universiteit Amsterdam, co-author of Modern Operating Systems (4th Edition).)
- ... It is amazing and unbelievable. (Luis Rocha, CISSP / ISSAP, Technical Manager, Network & Information Security at Verizon Business.)
- Thanks for the great work and your book. (Joris van de Vis, SAP Netweaver & Security specialist.)
- ... reasonable intro to some of the techniques. (Mike Stay, teacher at the Federal Law Enforcement Training Center, Georgia, US.)
- I love this book! I have several students reading it at the moment, plan to use it in graduate course. (Sergey Bratus, Research Assistant Professor at the Computer Science Department at Dartmouth College)
- Dennis @Yurichev has published an impressive (and free!) book on reverse engineering (Tanel Poder, Oracle RDBMS performance tuning expert)
- This book is some kind of Wikipedia to beginners... (Archer, Chinese Translator, IT Security Researcher.)
- Texas A&M University (4th page; archived);
- Comenius University in Bratislava (link; archived);
- Masaryk University (link; archived);
- Technical University of Munich (link; archived);
- Hasso Plattner Institute (link; archived);
- Ivanovo Power Engineering Institute (link; archived);
- Chelyabinsk State University (link; archived);
- Aalto University (link; archived).
- Amsterdam University of Applied Sciences (link: "Recommended reading").
- Edith Cowan University (link; archived).
If you know about others, please drop me a note!
Download PDF files
Download English version | A4 (for browsing or printing) | A5 (for ebook readers) |
Скачать русскую (Russian) версию | A4 (для просмотра или печати) | A5 (для электронных читалок) |
PGP Signatures
For those, who wants to be sure the PDF files has been compiled by me, it's possible to check PGP signatures, which are: RE4B-EN-A5.pdf.sig, RE4B-EN.pdf.sig, RE4B-RU-A5.pdf.sig, RE4B-RU.pdf.sig.My PGP public keys are here: http://yurichev.com/pgp.html.
Contents
Topics discussed: x86/x64, ARM/ARM64, MIPS, Java/JVM.Topics touched: Oracle RDBMS, Itanium, copy-protection dongles, LD_PRELOAD, stack overflow, ELF, win32 PE file format, x86-64, critical sections, syscalls, TLS, position-independent code (PIC), profile-guided optimization, C++ STL, OpenMP, win32 SEH.
Call for translators!
You may want to help me with translation this work into languages other than English and Russian.Just send me any piece of translated text (no matter how short) and I'll put it into my LaTeX source code.
Korean, Chinese and Persian languages are reserved by publishers.
English and Russian versions I do by myself, but my English is still that horrible, so I'm very grateful for any notes about grammar, etc. Even my Russian is also flawed, so I'm grateful for notes about Russian text as well!
So do not hesitate to contact me: dennis(a)yurichev.com
Donors
Those who supported me during the time when I wrote significant part of the book:2 * Oleg Vygovsky (50+100 UAH), Daniel Bilar ($50), James Truscott ($4.5), Luis Rocha ($63), Joris van de Vis ($127), Richard S Shultz ($20), Jang Minchang ($20), Shade Atlas (5 AUD), Yao Xiao ($10), Pawel Szczur (40 CHF), Justin Simms ($20), Shawn the R0ck ($27), Ki Chan Ahn ($50), Triop AB (100 SEK), Ange Albertini (€10+50), Sergey Lukianov (300 RUR), Ludvig Gislason (200 SEK), Gérard Labadie (€40), Sergey Volchkov (10 AUD), Vankayala Vigneswararao ($50), Philippe Teuwen ($4), Martin Haeberli ($10), Victor Cazacov (€5), Tobias Sturzenegger (10 CHF), Sonny Thai ($15), Bayna AlZaabi ($75), Redfive B.V. (€25), Joona Oskari Heikkilä (€5), Marshall Bishop ($50), Nicolas Werner (€12), Jeremy Brown ($100), Alexandre Borges ($25), Vladimir Dikovski (€50), Jiarui Hong (100.00 SEK), Jim Di (500 RUR), Tan Vincent ($30), Sri Harsha Kandrakota (10 AUD), Pillay Harish (10 SGD), Timur Valiev (230 RUR), Carlos Garcia Prado (€10), Salikov Alexander (500 RUR), Oliver Whitehouse (30 GBP), Katy Moe ($14), Maxim Dyakonov ($3), Sebastian Aguilera (€20), Hans-Martin Münch (€15), Jarle Thorsen (100 NOK), Vitaly Osipov ($100), Yuri Romanov (1000 RUR), Aliaksandr Autayeu (€10), Tudor Azoitei ($40), Z0vsky (€10), Yu Dai ($10).
Thanks a lot to every donor!
As seen on...
... hacker news, reddit, habrahabr.ru, Russian-speaking RE forum. There are some parts translated to Chinese.The book at Goodreads website.
mini-FAQ
Q: I clicked on hyperlink inside of PDF-document, how to get back?A: (Adobe Acrobat Reader) Alt + LeftArrow
Q: May I print this book? Use it for teaching?
A: Of course, that's why book is licensed under Creative Commons terms (CC BY-SA 4.0). Someone may also want to build their own version of book, read here about it.
Q: Why this book is free? You've done great job. This is suspicious, as many other free things.
A: To my own experience, authors of technical literature do this mostly for self-advertisement purposes. It's not possible to gain any decent money from such work.
Q: I have a question...
A: Write me it by email (dennis(a)yurichev.com).
Supplementary materials
All exercises are moved to standalone website: challenges.re.Be involved!
Feel free to send me corrections, or, it's even possible to submit patches on book's source code (LaTeX) on GitHub or BitBucket, or SourceForge!Any suggestions, what also should be added to my book?
Write me an email: dennis(a)yurichev.com
News
See ChangeLogStay tuned!
My current plans for this book: Objective-C, Visual Basic, anti-debugging tricks, Windows NT kernel debugger, .NET, Oracle RDBMS.Here is also my blog and facebook.Web 2.0 hater? Subscribe to my mailing list for receiving updates of this book to email.
About Korean publication
In January 2015, Acorn publishing company (www.acornpub.co.kr) in South Korea did huge amount of work in translating and publishing my book (state which is it in August 2014) in Korean language.Now it's available at their website.
![]() | ![]() |
Cover pictures was done by my artist friend Andy Nechaevsky: facebook/andydinka.
They are also the Korean translation copyright holder.
So if you want to have a "real" book on your shelf in Korean language and/or want to support my work, now you may buy it.

Thursday, July 28, 2016
Scipy Lecture Notes
http://www.scipy-lectures.org/
Tutorials on the scientific Python ecosystem: a quick introduction to
central tools and techniques. The different chapters each correspond
to a 1 to 2 hours course with increasing level of expertise, from
beginner to expert.
One document to learn numerics, science, and data with Python
Download
Tuesday, July 19, 2016
Why code review beats testing: evidence from decades of programming research
https://kev.inburke.com/kevin/the-best-ways-to-find-bugs-in-your-code/
tl;dr If you want to ship high quality code, you should invest in more than one of formal code review, design inspection, testing, and quality assurance. Testing catches fewer bugs per hour than human inspection of code, but it might be catching different types of bugs.
Everyone wants to find bugs in their programs. But which methods are the most effective for finding bugs? I found this remarkable chart in chapter 20 of Steve McConnell’s Code Complete. Summarized here, the chart shows the typical percentage of bugs found using each bug detection technique. The range shows the high and low percentage across software organizations, with the center dot representing the average percentage of bugs found using that technique.
As McConnell notes, “The most interesting facts … are that the modal rates
don’t rise above 75% for any one technique, and that the techniques average
about 40 percent.” Especially noteworthy is the poor performance of testing
compared to formal design review (human inspection). There are three pages of
fascinating discussion that follow; I’ll try to summarize.
Shull et al (2002) estimate that non-severe defects take approximately 14 hours of debugging effort after release, but only 7.4 hours before release, meaning that non-critical bugs are twice as expensive to fix after release, on average. However, the multiplier becomes much, much larger for severe bugs: according to several estimates severe bugs are 100 times more expensive to fix after shipping than they are to fix before shipping.
More generally, inspections are a cheaper method of finding bugs than testing; according to Basili and Selby (1987), code reading detected 80 percent more faults per hour than testing, even when testing programmers on code that contained zero comments. This went against the intuition of the professional programmers, which was that structural testing would be the most efficient method.
The researchers conducted a number of different experiments to try and measure numbers accurately. Here are some examples:
Before any code at Google gets checked in, one owner of the code base must review and approve the change (formal code review and design inspection). Google also enforces unit tests, as well as a suite of automated tests, fuzz tests, and end to end tests. In addition, everything gets dogfooded internally before a release to the public (high-volume beta test).
It’s likely that any company with a reputation for shipping high-quality code will have systems in place to test at least three of the categories mentioned above for software quality.
References
tl;dr If you want to ship high quality code, you should invest in more than one of formal code review, design inspection, testing, and quality assurance. Testing catches fewer bugs per hour than human inspection of code, but it might be catching different types of bugs.
Everyone wants to find bugs in their programs. But which methods are the most effective for finding bugs? I found this remarkable chart in chapter 20 of Steve McConnell’s Code Complete. Summarized here, the chart shows the typical percentage of bugs found using each bug detection technique. The range shows the high and low percentage across software organizations, with the center dot representing the average percentage of bugs found using that technique.
10
20
30
40
50
60
70
80
90
100
Regression test
Informal code reviews
Unit test
New function (component) test
Integration test
Low-volume beta test (< 10 users)
Informal design reviews
Personal desk checking of code
System test
Formal design inspections
Formal code inspections
Modeling or prototyping
High-volume beta test (> 1000 users)
What does this mean for me?
No one approach to bug detection is adequate. Capers Jones – the researcher behind two of the papers McConnell cites – divides bug detection into four categories: formal design inspection, formal code inspection, formal quality assurance, and formal testing. The best bug detection rate, if you are only using one of the above four categories, is 68%. The average detection rate if you are using all four is 99%.But a less-effective method may be worth it, if it’s cheap enough
It’s well known that bugs found early on in program development are cheaper to fix; costs increase as you have to push changes to more users, someone else has to dig through code that they didn’t write to find the bug, etc. So while a high-volume beta test is highly effective at finding bugs, it may be more expensive to implement this (and you may develop a reputation for releasing highly buggy software).Shull et al (2002) estimate that non-severe defects take approximately 14 hours of debugging effort after release, but only 7.4 hours before release, meaning that non-critical bugs are twice as expensive to fix after release, on average. However, the multiplier becomes much, much larger for severe bugs: according to several estimates severe bugs are 100 times more expensive to fix after shipping than they are to fix before shipping.
More generally, inspections are a cheaper method of finding bugs than testing; according to Basili and Selby (1987), code reading detected 80 percent more faults per hour than testing, even when testing programmers on code that contained zero comments. This went against the intuition of the professional programmers, which was that structural testing would be the most efficient method.
How did the researchers measure efficiency?
In each case the efficiency was calculated by taking the number of bug reports found through a specific bug detection technique, and then dividing by the total number of reported bugs.The researchers conducted a number of different experiments to try and measure numbers accurately. Here are some examples:
- Giving programmers a program with 15 known bugs, telling them to use a variety of techniques to find bugs, and observing how many they find (no one found more than 9, and the average was 5)
- Giving programmers a specification, measuring how long it took them to write the code, and how many bugs existed in the code base
- Formal inspections of processes at companies that produce millions of lines of code, like NASA.
In our company we have low bug rates, thanks to our adherence to software philosophy X.
Your favorite software philosophy probably advocates using several different methods listed above to help detect bugs. Pivotal Labs is known for pair programming, which some people rave about, and some people hate. Pair programming means that with little effort they’re getting informal code review, informal design review and personal desk-checking of code. Combine that with any kind of testing and they are going to catch a lot of bugs.Before any code at Google gets checked in, one owner of the code base must review and approve the change (formal code review and design inspection). Google also enforces unit tests, as well as a suite of automated tests, fuzz tests, and end to end tests. In addition, everything gets dogfooded internally before a release to the public (high-volume beta test).
It’s likely that any company with a reputation for shipping high-quality code will have systems in place to test at least three of the categories mentioned above for software quality.
Conclusions
If you want to ship high quality code, you should invest in more than one of formal code review, design inspection, testing, and quality assurance. Testing catches fewer bugs per hour than human inspection of code, but it might be catching different types of bugs. You should definitely try to measure where you are finding bugs in your code and the percentage of bugs you are catching before release – 85% is poor, and 99% is exceptional.Appendix
The chart was created using the jQuery plotting library flot. Here’s the raw Javascript, and the CoffeeScript that generated the graph.References
- Steve McConnell’s Code Complete, which I’d recommend for anyone who’s interested in improving the quality of the code they write.
- Capers Jones, “Software defect-removal efficiency”, published in Computer, volume 29, issue 4, unfortunately the paper is gated.
- Forrest Shull et al, “What We Have Learned About Fighting Software Defects,” 2002
- Victor Basili and Richard Selby, “Comparing the Effectiveness of Software Testing Strategies,” IEEE Transactions on Software Engineering, volume 13, issue 12.
Computers in Spaceflight: The NASA Experience
http://history.nasa.gov/computers/Ch4-5.html
- Chapter Four - - Computers in the Space Shuttle Avionics System - Developing software for the space shuttle - [108] During 1973 and 1974 the first requirements began to be specified for what has become one of the most interesting software systems ever designed. It was obvious from the very beginning that developing the Shuttle's software would be a complicated job. Even though NASA engineers estimated the size of the flight software to be smaller than that on Apollo, the ubiquitous functions of the Shuttle computers meant that no one group of engineers and no one company could do the software on its own. This increased the size of the task because of the communication necessary between the working groups. It also increased the complexity of a spacecraft already made complex by flight requirements and redundancy. Besides these realities, no one could foresee the final form that the software for this pioneering vehicle would take, even after years of development work had elapsed, since there continued to be both minor and major changes. NASA and its contractors made over 2,000 requirements changes between 1975 and the first flight in 198180. As a result, about $200 million was spent on software, as opposed to an initial estimate of $20 million. Even so, NASA lessened the difficulties by making several early decisions that were crucial for the program's success. NASA separated the software contract from the hardware contract, closely managed the contractors and their methods, chose a high-level language, and maintained conceptual integrity.
- NASA awarded IBM Corporation the first independent Shuttle software contract on March 10, 1973. IBM and Rockwell International had worked together during the period of competition for the orbiter contract81. Rockwell bid on the entire aerospacecraft, intending to subcontract the computer hardware and software to IBM. But to Rockwell's dismay, NASA decided to separate the software contract from the orbiter contract. As a result, Rockwell still subcontracted with IBM for the computers, but IBM hand a separate software contract monitored closely by the Spacecraft Software Division of the Johnson Space Center. There are several reasons why this division of labor occurred. Since software does not weigh anything in and of itself, it is used to overcome hardware problems that would require extra systems and components (such as a mechanical control system)82. Thus software is in many ways the most critical component of the Shuttle, as it ties the other components together. Its importance to the overall program alone justified a separate contract, since it made the contractor directly accountable to NASA. Moreover, during the operations phase, software underwent the most changes, the hardware being essentially fixed83. As time went on, Rockwell's responsibilities as [109] prime hardware contractor were phased out, and the shuttles were turned over to an operations group. In late 1983, Lockheed Corporation, not Rockwell, won the competition for the operations contract. By keeping the software contract separate, NASA could develop the code on a continuing basis. There is a considerable difference between changing maintenance mechanics on an existing hardware system and changing software companies on a not yet perfect system because to date the relationships between components in software are much harder to define than those in hardware. Personnel experienced with a specific software system are the best people to maintain it. Lastly, Christopher Kraft of Johnson Space Center and George Low of NASA Headquarters, both highly influential in the manned spacecraft program during the early 1970's, felt that Johnson had the software management expertise to handle the contract directly84.
- One of the lessons learned from monitoring Draper Laboratory in the Apollo era was that by having the software development at a remote site (like Cambridge), the synergism of informally exchanged ideas is lost; sometimes it took 3 to 4 weeks for new concepts to filter over85. IBM had a building and several hundred personnel near Johnson because of its Mission Control Center contracts. When IBM won the Shuttle contract, it simply increased its local force.
- The closeness of IBM to Johnson Space Center also facilitated the ability of NASA to manage the project. The first chief of the Shuttle's software, Richard Parten, observed that the experience of NASA managers made a significant contribution to the success of the programming effort86. Although IBM was a giant in the data processing industry, a pioneer in real time systems, and capable of putting very bright people on a project, the company had little direct experience with avionics software. As a consequence, Rockwell had to supply a lot of information relating to flight control. Conversely, even though Rockwell projects used computers, software development on the scale needed for the Shuttle was outside its experience. NASA Shuttle managers provided the initial requirements for the software and facilitated the exchange of information between the principal contractors. This situation was similar to that during the 1960s when NASA had the best rendezvous calculations people in the world and had to contribute that expertise to IBM during the Gemini software development. Furthermore, the lessons of Apollo inspired the NASA managers to push IBM for quality at every point87.
- The choice of a high level language for doing the majority of the coding was important because, as Parten noted, with all the changes, "we'd still be trying to get the thing off the ground if we'd used assembly language"88. Programs written in high level languages are far easier to modify. Most of the operating system software, which is rarely changed, is in assembler, but all applications software and some of the interfaces and redundancy management code is in HAL/S89.
- [110] Although the decision to program in a high-level language meant that a large amount of support software and development tools had to be written, the high-level language nonetheless proved advantageous, especially since it has specific statements created for real-time programming.
Defining the Shuttle Software - In the end, the success of the Shuttle's software development was due to the conceptual integrity established by using rigorously maintained requirements documents. The requirements phase is the beginning of the software life cycle, when the actual functions, goals, and user interfaces of the eventual software are determined in full detail. If a team of a thousand workers was asked to set software requirements, chaos would result90. On the other hand, if few do the requirements but many can alter them later, then chaos would reign again. The strategy of using a few minds to create the software architecture and interfaces and then ensuring that their ideas and theirs alone are implemented, is termed "maintaining conceptual integrity," which is well explained in Frederick C. Brooks' The Mythical Man Month 91. As for other possible solutions, Parten says, "the only right answer is the one you pick and make to work"92.
- Shuttle requirements documents were arranged in three Levels: A, B, and C, the first two written by Johnson Space Center engineers. John R. Garman prepared the Level A document, which is comprised of a comprehensive description of the operating system, applications programs, keyboards, displays, and other components of the software system and its interfaces. William Sullivan wrote the guidance, navigation and control requirements, and John Aaron, the system management and payload specifications of Level B. They were assisted by James Broadfoot and Robert Ernull93. Level B requirements are different in that they are more detailed in terms of what functions are executed when and what parameters are needed94. The Level Bs also define what information is to be kept in COMPOOLS, which are HAL/S structures for maintaining common data accessed by different tasks95. The Level C requirements were more of a design document, forming a set with Level B requirements, since each end item at Level C must be traceable to a Level B requirement96. Rockwell International was responsible for the development of the Level C require ments as, technically, this is where the contractors take over from the customer, NASA, in developing the software.
- Early in the program, however, Draper Laboratory had significant influence on the software and hardware systems for the Shuttle. Draper was retained as a consultant by NASA and contributed two [111] key items to the software development process. The first was a document that "taught" NASA and other contractors how to write require ments for software, how to develop test plans, and how to use func tional flow diagrams, among other tools97. It seems ironic that Draper was instructing NASA and IBM on such things considering its difficulties in the mid-1960s with the development of the Apollo flight software. It was likely those difficult experiences that helped motivate the MIT engineers to seriously study software techniques and to become, within a very short time, one of the leading centers of software engineering theory. The Draper tutorial included the concept of highly modular software, software that could be "plugged into" the main circuits of the Shuttle. This concept, an application of the idea of interchangeable parts to software, is used in many software systems today, one example being the UNIX*** operating system developed at Bell Laboratories in the 1970s, under which single function software tools can be combined to perform a large variety of functions.
- Draper's second contribution was the actual writing of some early Level C requirements as a model98. This version of the Level C documents contained the same components as in the later versions delivered by Rockwell to IBM for coding. Rockwell's editions, however, were much more detailed and complete, reflecting their practical, rather than theoretical purpose and have been an irritation for IBM. IBM and NASA managers suspect that Rockwell, miffed when the software contract was taken away from them, may have delivered incredibly precise and detailed specifications to the software contractor. These include descriptions of flight events for each major portion of the software, a structure chart of tasks to be done by the software during that major segment, a functional data flowchart, and, for each module, its name, calculations, and operations to be performed, and input and output lists of parameters, the latter already named and accompanied by a short definition, source, precision, and what units each are in. This is why one NASA manager said that "you can't see the forest for the trees" in Level C, oriented as it is to the production of individual modules99. One IBM engineer claimed that Rockwell went "way too far" in the Level C documents, that they told IBM too much about how to do things rather than just what to do100. He further claimed that the early portion of the Shuttle development was "underengineered" and that Rockwell and Draper included some requirements that were not passed on by NASA. Parten, though, said that all requirements documents were subject to regular review by joint teams from NASA and Rockwell101.
- The impression one gains from documents and interviews is that both Rockwell and IBM fell victim to the "not invented here" [112] syndrome: If we didn't do it, it wasn't done right. For example, Rockwell delivered the ascent requirements, and IBM coded them to the letter, thereby exceeding the available memory by two and a third times and demonstrating that the requirements for ascent were excessive. Rockwell, in return, argued for 2 years about the nature of the operating system, calling for a strict time-sliced system, which allocates predefined periods of time for the execution of each task and then suspends tasks unfinished in that time period and moves on to the next one. The system thus cycles through all scheduled tasks in a fixed period of time, working on each in turn. Rockwell's original proposal was for a 40-millisecond cycle with synchronization points at the end of each102. IBM, at NASA's urging, countered with a priority-interrupt-driven system similar to the one on Apollo Rockwell, experienced with time-slice systems, fought this from 1973 to 1975, convinced it would never workl03.
- The requirements specifications for the Shuttle eventually contained in their three levels what is in both the specification and design stage of the software life cycle. In this sense, they represent a fairly complete picture of the software at an early date. This level of detail at least permitted NASA and its contractors to have a starting point in the development process. IBM constantly points to the number of changes and alterations as a continuing problem, partially ameliorated by implementing the most mature requirements first104. Without the attempt to provide detail at an early date, IBM would not have had any mature requirements when the time came to code. Even now, requirements are being changed to reflect the actual software, so they continue to be in a process of maturation. But early development of specifications became the means by which NASA could enforce conceptual integrity in the shuttle software.
Architecture of the Primary Avionics Software System - The Primary Avionics Software System, or PASS, is the software that runs in all the Shuttle's four primary computers. PASS is divided into two parts: system software and applications software. The system software is the Flight Computer Operating System (FCOS), the user interface programming, and the system control programs, whereas the applications software is divided into guidance, navigation and control, orbiter systems management, payload and checkout programs. Further divisions are explained in Box 4-3.
-
- [113] Box 4-3: Structure of PASS Applications Software
- The PASS guidance and navigation software is divided into major functions, dictated by mission phases, the most obvious of which are preflight, ascent, on-orbit, and descent. The requirements state that these major functions be called OPS, or operational sequences. (e.g., OPS-1 is ascent; OPS-3, descent.) Within the OPS are major modes. In OPS-1, the first-stage burn, second-stage burn, first orbital insertion burn, second orbital insertion burn, and the initial on-orbit coast are major modes; transition between major modes is automatic. Since the total mission software exceeds the capacity of the memory, OPS transitions are normally initiated by the crew and require the OPS to be loaded from the MMU. This caused considerable management concern over the preservation of data, such as the state vector, needed in more than one OPS105. NASA's solution is to keep common data in a major function base, which resides in memory continuously and is not overlaid by new OPS being read into the computers.
- Within each OPS, there are special functions (SPECs) and display functions (DISPs). These are available to the crew as a supplement to the functions being performed by the current OPS. For example, the descent software incorporates a SPEC display showing the horizontal situation as a supplement to the OPS display showing the vertical situation. This SPEC is obviously not available in the on-orbit OPS. A DISP for the on-orbit OPS may show fuel cell output levels, fuel reserves in the orbital maneuvering system, and other such information. SPECs usually contain items that can be selected by the crew for execution. DISPs are just what their name means, displays and not action items. Since SPECs and DISPs have lower priority than OPS, when a big OPS is in memory they have to be kept on the tape and rolled in when requested106. The actual format of the SPECs, DISPs, OPS displays, and the software that interprets crew entries on the keyboard is in the user interface portion of the system software.
- The most critical part of the system software is the FCOS. NASA, Rockwell, and IBM solved most of the grand conceptual problems, such as the nature of the operating system and the redundancy management scheme, by 1975. The first task was to convert the FCOS from the proposed 40-millisecond loop operating system to a priority-driven [113] system107. Priority interrupt systems are superior to time-slice systems because they degrade gracefully when overloaded108. In a time-slice system, if the tasks scheduled in the current cycle get bogged down by excessive I/O operations, they tend to slow down the total time of execution of processes. IBM's version of the FCOS actually has cycles, but they are similar to the ones in the Skylab system described in the previous chapter. The minor cycle is the high-frequency cycle; tasks within it are scheduled every 40 milliseconds. Typical tasks in this cycle are those related to active flight control in the atmosphere. The major cycle is 960 milliseconds, and many monitoring and system management tasks are scheduled at that frequency109. If a process is still running when its time to.....
[114]Figure 4-6. A block diagram of the Shuttle flight computer software architecture. (From NASA, Data Processing System Workbook)
- .....restart comes up due to excessive I/O or because it was interrupted, it cancels its next cycle and finishes up110. If a higher priority process is called when another process is running, then the current process is interrupted and a program status word (PSW) containing such items as the address of the next instruction to be executed is stored until the interruption is satisfied. The last instruction of an interrupt is to restore the old PSW as the current PSW so that the interrupted process can continue111. The ability to cancel processes and to interrupt them asynchronously provides flexibility that a strict time-slice system does not.
- A key requirement of the FCOS is to handle the real-time statements in the HAL/S language. The most important of these are SCHEDULE, which establishes and controls the frequency of execution of processes; TERMINATE and CANCEL, which are the opposite of SCHEDULE; and WAIT, which conditionally suspends execution112. The method of implementing these statements is controlled [115] by a separate interface control document113. SCHEDULE is generally programmed at the beginning of each operational sequence to set up which tasks are to be done in that software segment and how often they are to be done. The syntax of SCHEDULE permits the programmer to assign a frequency and priority to each task. TERMINATE and CANCEL are used at the end of software phases or to stop an unneeded process while others continue. For example, after the solid rocket boosters burn out and separate, tasks monitoring them can cease while tasks monitoring the main engines continue to run. WAIT, although handy, is avoided by IBM because of the possibility of the software being "hung up" while waiting for the I/O or other condition required to continue the process114. This is called a race condition or "deadly embrace" and is the bane of all shared resource computer operating systems.
- The FCOS and displays occupy 35K of memory at all times115. Add the major function base and other resident items, and about 60K of the 106K of core remains available for the applications programs. Of the required applications programs, ascent and descent proved the most troublesome. Fully 75% of the software effort went into those two programs116. After the first attempts at preparing the ascent software resulted in a 140K load, serious code reduction began. By 1978, IBM reduced the size of the ascent program to 116K, but NASA Headquarters demanded it be further knocked down to 80K117. The lowest it ever got was 98,840 words (including the system software), but its size has since crept back up to nearly the full capacity of the memory. IBM accomplished the reduction by moving functions that could wait until later operational sequences118. The actual figures for the test flight series programs are in Table 4-1119. The total size of the flight test software was 500,000 words of code. Producing it and modifying it for later missions required the development of a complete production facility.
. | |
NAME | K WORDS |
. | . |
Preflight initialization | 72.4 |
Preflight checkout | 81.4 |
Ascent and abort | 105.2 |
On-orbit | 83.1 |
On-orbit checkout | 80.3 |
On-orbit system management | 84.1 |
Entry | 101.1 |
Mass memory utility | 70.1 |
Implementing PASS - NASA planned that PASS would be a continuing development process. After the first flight programs were produced, new functions needed to be added and adapted to changing payload and mission requirements. For instance, over 50% of PASS modules changed during the first 12 flights in response to requested enhancements120. To do this work, NASA established a Software Development Laboratory at Johnson Space Center in 1972 to prepare for the implementation of the Shuttle programs and to make the software tools needed for efficient coding and maintenance. The Laboratory evolved into the Software Production Facility (SPF) in which the software development is carried on in the operations era. Both the facilities were equipped and managed by NASA but used largely by contractors.
- The concept of a facility dedicated to the production of onboard software surfaced in a Rand Corporation memo in early 1970121. The memo summarized a study of software requirements for Air Force space missions during the decade of the 1970s. One reason for a government-owned and operated software factory was that it would be easier to establish and maintain security. Most modules developed for [117] the Shuttle, such as the general flight control software and memory displays, would be unclassified. However, Department of Defense (DoD) payloads require system management and payload management software, plus occasional special maneuvering modules. These were expected to be classified. Also, if the software maintenance contract moved from the original prime contractor to some different operations contractor, it would be considerably simpler to accomplish the transfer if the software library and development computers were government owned and on government property. Lastly, having such close control over existing software and new development would eliminate some of the problems in communication, verification, and maintenance encountered in the three previous manned programs.
- Developing the SPF turned out to be as large a task as developing the flight software itself. During the mid-1970s, IBM had as many people doing software for the development lab as they had working on PASS122. The ultimate purpose of the facility is to provide a programming team with sufficient tools to prepare a software load for a flight. This software load is what is put on to the MMU tape that is flown on the spacecraft. In the operations era of the 1980s, over 1,000 compiled modules are available. These are fully tested, and often previously used, versions of tasks such as main engine throttling, memory modification, and screen displays that rarely change from flight to flight. New, mission-specific modules for payloads or rendezvous maneuvers are developed and tested using the SPF's programming tools, which themselves represent more than a million lines of code123. The selection of existing modules and the new modules are then combined into a flight load that is subject to further testing. NASA achieved the goal of having such an efficient software production system through an 8-year development process when the SPF was still the Laboratory.
- In 1972, NASA studied what sort of equipment would be required for the facility to function properly. Large mainframe computers compatible with the AP-101 instruction set were a must. Five IBM 360/75 computers, released from Apollo support functions, were available124. These were the development machines until January of 1982125. Another requirement was for actual flight equipment on which to test developed modules. Three AP-101 computers with associated display electronics units connected to the 360s with a flight equipment interface device (FEID) especially developed for the purpose. Other needed components, such as a 6-degree-of-freedom flight simulator, were implemented in software126. The resulting group of equipment is capable of testing the flight software by interpreting instructions, simulating functions, and running it in the actual flight hardware127.
- In the late 1970s, NASA realized that more powerful computers were needed as the transition was made from development to operations. The 360s filled up, so NASA considered the Shuttle Mission [118] Simulator (SMS), the Shuttle Avionics Instrumentation Lab (SAIL), and the Shuttle Data Processing Center's computers as supplementary development sites, but this idea was rejected because they were all too busy doing their primary functions128. In 1981, the Facility added two new IBM 3033N computers, each with 16 million bytes of primary memory. The SPF then consisted of those mainframes, the three AP-101 computers and the interface devices for each, 20 magnetic tape drives, six line printers, 66 million bytes of drum memory, 23.4 billion bytes of disk memory, and 105 terminals129. NASA accomplished rehosting the development software to the 3033s from the 360s during the last quarter of 1981. Even this very large computer center was not enough. Plans at the time projected on-line primary memory to grow to 100 million bytes130, disk storage to 160 billion bytes131, and two more interface units, display units, and AP-101s to handle the growing DOD business132. Additionally, terminals connected directly to the SPF are in Cambridge, Massachusetts, and at Goddard Space Flight Center, Marshall Space Flight Center, Kennedy Space Center, and Rockwell International in Downey, California133.
- Future plans for the SPF included incorporating backup system software development, then done at Rockwell, and introducing more automation. NASA managers who experienced both Apollo and the Shuttle realize that the operations software preparation is not enough to keep the brightest minds sufficiently occupied. Only a new project can do that. Therefore, the challenge facing NASA is to automate the SPF, use more existing modules, and free people to work on other tasks. Unfortunately, the Shuttle software still has bugs, some of which are no fault of the flight software developers, but rather because all the tools used in the SPF are not yet mature. One example is the compiler for HAL/S. Just days before the STS-7 flight, in June, 1983, an IBM employee discovered that the latest release of the compiler had a bug in it. A quick check revealed that over 200 flight modules had been modified and recompiled using it. All of those had to be checked for errors before the flight could go. Such problems will continue until the basic flight modules and development tools are no longer constantly subject to change. In the meantime, the accuracy of the Shuttle software is dependent on the stringent testing program conducted by IBM and NASA before each flight.
Verification and Change Management of the Shuttle Software - IBM established a separate line organization for the verification of the Shuttle software. IBM's overall Shuttle manager has two managers reporting to him, one for design and development, and one for verification and field operations. The verification group has just [119] less than half the members of the development group and uses 35% of the software budget134. There are no managerial or personnel ties to the development group, so the test team can adopt an "adversary relationship" with the development team. The verifiers simply assume that the software is untested when received135. In addition, the test team can also attempt to prove that the requirements documents are wrong in cases where the software becomes unworkable. This enables them to act as the "conscience" of the entire project136.
- IBM began planning for the software verification while the requirements were being completed. By starting verification activity as the software took shape, the test group could plan its strategy and begin to write its own books. The verification documentation consists of test specifications and test procedures including the actual inputs to be used and the outputs expected, even to the detail of showing the content of the CRT screens at various points in the test137. The software for the first flight had to survive 1,020 of these tests138. Future flight loads could reuse many of the test cases, but the preparation of new ones is a continuing activity to adjust to changes in the software and payloads, each of which must be handled in an orderly manner.
- Suggestions for changes to improve the system are unusually welcome. Anyone, astronaut, flight trainer, IBM programmer, or NASA manager, can submit a change request139. NASA and IBM were processing such requests at the rate of 20 per week in 1981140. Even as late as 1983 IBM kept 30 to 40 people on requirements analysis, or the evaluation of requests for enhancements141. NASA has a corresponding change evaluation board. Early in the program, it was chaired by Howard W. Tindall, the Apollo software manager, who by then was head of the Data Systems and Analysis Directorate. This turned out to be a mistake, as he had conflicting interests142. The change control board moved to the Shuttle program office. Due to the careful review of changes, it takes an average of 2 years for a new requirement to get implemented, tested, and into the field143. Generally, requests for extra functions that would push out current software due to memory restrictions are turned down144.
- [120] Box 4-4: How IBM Verifies the Shuttle Flight Software
- The Shuttle software verification process actually begins before the test group gets the software, in the sense that the development organization conducts internal code reviews and unit tests of individual modules and then integration tests of groups of modules as they are assembled into a software load. There are two levels of code inspection, or "eyeballing" the software looking for logic errors. One level of inspection is by the coders themselves and their peer reviewers. The second level is done by the outside verification team. This activity resulted in over 50% of the discrepancy reports (failures of the software to meet the specification) filed against the software, a percentage similar to the Apollo experience and reinforcing the value of the idea145. When the software is assembled, it is subject to the First Article Configuration Inspection (FACI), where it is reviewed as a complete unit for the first time. It then passes to the outside verification group.
- Because of the nature of the software as it is delivered, the verification team concentrates on proving that it meets the customer's requirements and that it functions at an acceptable level of performance. Consistent with the concept that the software is assumed untested, the verification group can go into as much detail as time and cost allow. Primarily, the test group concentrates on single software loads, such as ascent, on-orbit, and so forth146. To facilitate this, it is divided into teams that specialize in the operating system and detail, or functional verification; teams that work on guidance, navigation, and control; and teams that certify system performance. These groups have access to the software in the SPF, which thus doubles as a site for both development and testing. Using tools available in the SPF, the verification teams can use the real flight computers for their tests (the preferred method). The testers can freeze the execution of software on those machines in order to check intermediate results, alter memory, and even get a log of what commands resulted in response to what inputs147.
- After the verification group has passed the software, it is given an official Configuration Inspection and turned over to NASA. At that point NASA assumes configuration control, and any changes must be approved through Agency channels. Even though NASA then has the software, IBM is not finished with it148.
- [121] The software is usually installed in the SAIL for prelaunch, ascent, and abort simulations, the Flight Simulation Lab (FSL) in Downey for orbit, de-orbit, and entry simulations, and the SMS for crew training. Although these installations are not part of the preplanned verification process, the discrepancies noted by the users of the software in the roughly 6 months before launch help complete the testing in a real environment. Due to the nature of real-time computer systems, however, the software can never be fully certified, and both IBM and NASA are aware of this149. There are simply too many interfaces and too many opportunities for asynchronous input and output.
- Discrepancy reports cause changes in software to make it match the requirements. Early in the program, the software found its way into the simulators after less verification because simulators depend on software just to be turned on. At that time, the majority of the discrepancy reports were from the field installations. Later, the majority turned up in the SPF150. All discrepancy reports are formally disposed of, either by appropriate fixes to the software, or by waiver. Richard Parten said, "Sometimes it is better to put in an 'OPS Note' or waiver than to fix (the software). We are dependent on smart pilots"151. If the discrepancy is noted too close to a flight, if it requires too much expense to fix, it can be waived if there is no immediate danger to crew safety. Each Flight Data File carried on board lists the most important current exceptions of which the crew must be aware. By STS-7 in June of 1983, over 200 pages of such exceptions and their descriptions existed152. Some will never be fixed, but the majority were addressed during the Shuttle launch hiatus following the 51L accident in January 1986.
- So, despite the well-planned and well-manned verification effort, software bugs exist. Part of the reason is the complexity of the real-time system, and part is because, as one IBM manager said, "we didn't do it up front enough," the "it" being thinking through the program logic and verification schemes153. Aware that effort expended at the early part of a project on quality would be much cheaper and simpler than trying to put quality in toward the end, IBM and NASA tried to do much more at the beginning of the Shuttle software development than in any previous effort, but it still was not enough to ensure perfection.
- [122] Box 4-5: The Nature of the Backup Flight System
- The Backup Flight System consists of a single computer and a software load that contains sufficient functions to handle ascent to orbit, selected aborts during ascent, and descent from orbit to landing site. In the interest of avoiding a generic software failure, NASA kept its development separate from PASS. An engineering directorate, not the on-board software division, managed the software contract for the backup, won by Rockwell154.
- The major functional difference between PASS and the backup is that the latter uses a time-slice operating system rather than the asynchronous priority-driven system of PASS155. This is consistent with Rockwell's opinion on how that system was to be designed. Ironically, since the backup must listen in on PASS operations so as to be ready for instant takeover, PASS had to be modified to make it more synchronous156. Sixty engineers were still working on the Backup Flight System software as late as 1983157.
-
-
*** UNIX is a trademark of AT&T.

Code Is Never "Perfect", Code Is Only Ever "Good Enough"
I am a perfectionist. I know this, I accept it, and it still bites me in the ass. I want my code to be The Perfect Code, unbreakable, like a flawless diamond in a sea of cubic zirconium. I spend entirely too much time trying to cut that diamond from the rock it's buried in. But is The Perfect Code even attainable, or should we just settle for "good enough?" That's the question I've been wrestling with this week.
The Return of Dick and Bob
I've mentioned before that I often imagine having two awful sports announcers in my mind whose only goal in life is to subtly trash whatever I think or do or say. They're pretty much my impostor syndrome given a voice, and lately they've decided to make an unwelcome resurgence into my daily life.Dick: Well now, what do we have here, Bob? Looks like Matthew's writing that same function all over again! He must be trying for the Perfect Code!I want The Perfect CodeTM. There, I said it. The Perfect Code doesn't break, needs little maintenance, and is so easy to read you might mistake it for a novel. I am constantly in search of this mythical creature, and evermore disappointed when I can't summon it despite my best efforts. Sometimes I get so close I can almost make it out through the fog of controllers and repositories, only for it to slink further back into the overgrown jungle of code that I'm currently attempting to tame.
Bob: Sure does, Dick. This is his most recent attempt at it; the wily veteran giving one last go at the big win he's desired for so long.
Dick: We've seen this kind of thing before from him, especially during that last match against the file uploader. You can tell he's trying to make The Perfect Code by the never ending stream of invective he lets fly. Let's listen in:
Matthew: Piece of [bleep]! I know you [bleeping] work! WHY WON'T YOU [BLEEPING] WORK, MOTHER[BLEEPER]?!
Bob: Wow. He really wants this one, doesn't he?
Dick: He certainly does, Bob. He's reaching for perfection, and here, on his third attempt, he seems to be no closer than he was during the first two! Why is that?
Bob: Well, Dick, seems to me that he just can't write perfect code. It's a good effort, to be sure, but it looks like he just doesn't have it in him today.
Dick: After the break, we'll find out if this angry little programmer can achieve the greatest feat in software development: the Perfect Code! Hey, you never know. Stay tuned!
The trophy yet eludes me, and I fear it shall do so forever. I've been working this hard to make The Perfect Code, and yet I always fall short of my goal. Am I doomed to write imperfect, buggy code every working day for the rest of my career?
Short answer? Yep.
Myths and Legends
The Perfect Code is a myth; a unattainable legend which we devs nevertheless strive to achieve. The Perfect Code exists only in one's mind, a product of their own tendencies and biases, incompatible with any other developer's vision of it. You cannot hope to obtain it, you can only hope to approach it, and even that's a mighty difficult feat.Code cannot be "perfect." What is perfect to you is a steaming pile of crap to your neighbor. Once you have more than one person looking at any given piece of code, it can never again be considered "perfect" (and woe be unto the person who only ever has one person reviewing his code). There is no "Perfect Code" because no two programmers can agree on what that means. But they very often can agree on what might be "good enough."
There is no such thing as "perfect," there is only "good enough." But that leads to a more difficult question: when is our code good enough?
Aiming For "Good Enough"
If perfection is unattainable, at what point do we think our work is "good enough?" Is it when our code is readable, when our fellow developers can easily understand what it is doing? Is it when the code does what it is expected to do? Is it when it passes the tests we've defined? Is it when it fulfills what the customer wanted? There's no clear definition for "good enough."Funny thing about code being "good enough" is that, if you're like me, you often can't see it for yourself. You're too busy being absorbed into its world that you can't see the forest for the trees and miss the point at which the code becomes "good enough." To truly understand when your app has reached that point, you need an outside opinion. You need a coworker.
IMO striving for perfection is failing before you even start. I'm not omniscient, I'm not superhuman, I cannot possibly plan for all possibilities. Hell, I can't even plan my breakfast; I just take things out of the pantry until some combination of them sounds good enough to eat. I might end up with bacon, apples, and peanut butter sometimes, but its food and I can eat it, so it's "good enough."
And that's the goal, isn't it? Just make it good enough, whatever that means to you.
Every day, I will write buggy, imperfect code. Every day, I will make mistakes. Every day, I will be less than perfect. And every day, I attempt to be OK with that, even if Dick and Bob are not.
Subscribe to:
Comments (Atom)