Security: probably doing it wrong

Being knowledgable in the field of information security is useful and beneficial. However, it’s not sufficient, and while it’s (somewhat) easy to argue that it’s necessary there’s a big gap between being a security expert and making software better, or even making software more secure.

The security interaction on many projects goes something like this:

  • Develop software
  • Get a penetration tester in
  • Oh, shit
  • Fix anything that won’t take more than two days
  • Get remaining risk signed off by senior management
  • Ship
  • Observe that most of the time, this doesn’t cause much trouble

Now whether or not a company can afford to rely on that last bullet point being correct is a matter for the executives to decide, but let’s assume that they don’t want to depend on it. The problem they’ll have is that they must depend on it anyway, because the preceding software project was done wrong.

Security people love to think that they’re important and clever (and they are, just not any more than other software people). Throughout the industry you hear talk of “fail” or even “epic fail”. This is not jargon, it’s an example of the mentality that promotes calling developers idiots.

Did the developer get the security wrong because he’s an idiot, or was it because you didn’t tell him it was wrong until after he had finished?

“But we’re penetration testers; we weren’t engaged until after the developers had written the software.” Who’s fault is that? Did you tell anyone you had advice to give in the earlier stages of development? Did you offer to help with the system architecture, or with the requirements, or with tool selection?

You may think at this point that I shouldn’t rock the boat; that if we carry on allowing people to write insecure software, there’ll be more money to be made in testing it and writing reports about how many high-severity issues there are that need fixing. That may be true, though it won’t actually lead to software becoming more secure.

Take another look at the list of actions above. Once the project manager knows that the software has a number of high-priority issues, the decision that project manager will have to take looks like this:

If I leave these problems in the software, will that cause more work in the project, or in maintenance? Do I look like my bonus depends on what happens in maintenance?

So, as intimated in the process at the top of the post, you’ll see the quick fixes done – anything that doesn’t affect the ship date – but more fundamental problems will be left alone, or perhaps documented as “nice to haves” for a future version. Anything that requires huge changes, like architectural modification or component rewrites, isn’t going to happen.

If we actually want to get security problems fixed, we have to distribute the importance assigned to it more evenly. It’s no good having security people who think that security is the most important thing ever, if they’re not going to be the people making the stuff: conversely it’s no good having the people who make the thing unaware of security if it really does have some importance associated with it.

Here’s my proposal: it should be the responsibility of the software architect to know security or to know someone who knows security. Security is a requirement of a software system, and it’s the architect’s job to understand what the requirements are, how the software is to implement them and how to make any trade-off needed if the requirements come into conflict. It’s the architect’s job to justify those decisions, and to make them and see them followed throughout development.

That makes the software architect the perfect person to ensure that the relative importance of security versus performance, correctness, responsiveness, user experience and other aspects of the product is both understood and correctly executed in building the software. It promotes (or demotes, depending on your position) software security to its correct position in the firmament: as an aspect of constructing software.

Irresponsible tolerance

Context

@unclebobmartin said:

One of the bad behaviors that destroys projects is “irresponsible tolerance”. Tolerating what you know you should fix.

This triggered a discussion between @phil_nash and myself. As far as this got on the Twitters, we agreed that it’s not necessarily irresponsible to ignore a problem for now as long as what you’re actually doing is deferring the fix until you’ve got time…except that it’s easy for deferral to slip into tacit acceptance as other work comes up. We may even be able to delude ourselves into thinking we still intend to fix that issue “some day”, even though the reality is that will never happen.

My >140char response

Yes, that is easy to do. I’ve done it myself. I’ve even – though not in a number of years – used tolerance of a badly-written component as an excuse to avoid not just cleaning it up, but of doing other useful work on the same component. “Touching that spaghetti code would be too risky, and rewriting it would take too long, so let’s just leave it as it is.”

Since reading the GTD book, I’ve tried a new approach which has, for the most part, been more successful. It’s not exactly a GTD technique, but borrows the spirit. In GTD there’s a two-minute rule: if you think of something you need to do that would take less than two minutes, just do it. If it would take longer, add it to your backlog.

The analogous approach for refusing to tolerate software problems is this: if you see something you think needs fixing, and you have time to fix it now, fix it now. If you do not have time to fix it, write a bug report.

What goes into the bug report?

All of the things a good software architect should be logging as part of their work anyway: a description of the problem, discussion of potential solutions, choice of solution and justification of that solution. So if there’s some ugly class that needs rewriting, explain why it’s ugly. Describe what would be better, and why.

What do I get from this?

In the first instance, the act of describing what it is that you dislike about the current code often makes it easier to see that the fix actually wouldn’t take too long. So that really disgusting class is full of long methods: what’s three minutes with the “extract method” tool between friends?

Oftentimes the solution will still be too big to work on right now. So hit the “report” button, and get the bug report into the tracker (or the backlog, or icebox, or whatever this week’s cool term is. I can no longer keep up; I’m in my thirties now). You know how they say a problem shared is a problem halved? It’s crap. A problem shared is a problem everyone is burdened with, so there are more people to go “oh crap, yeah, I hate that too”. Maybe one (or more) of you has the time to spend a day or two sorting the issue out, or is willing to make time. Maybe someone else knows enough about that code to propose a better alternative.

Even if not, the whole team can no longer ignore the issue: every time someone looks at the outstanding issues, there’s your problem, reminding everyone not to tolerate it. It’s harder to say “oh yeah, I thought about fixing that once but I didn’t have the time” if every time you read the bug list you are forced to think about it again. One of these days you, or someone else, will have time to fix it, and so will have to either do that or think of a convincing excuse to shelve it again. Then explain at the next bug review time why the issue is still there. If you wrote a good justification for why the proposed solution would be better, it looks like you’re actively trying to avoid making a better product.

Depending on how you track bugs, you may have an additional benefit: the ability to link your complaint to other issues. So maybe (and this is a real-world example from my experience), the problem is that a class for reading files has a hard-coded list of search paths. Then a request comes in saying that an extra filesystem has been provided by IT, and they want to put some of the files into a location on this new filesystem. Link them. Someone will be assigned the user request that’s been prioritised as a business issue, and when they do they’ll see your report that a good way to fix the problem would also clean up the product, so they can do both at the same time. If the issue is linked to enough problems in the product, it becomes clear that addressing the underlying issue will benefit the customers and the work will be scheduled. Then you really have no excuse for not finding the time to fix it: it’s your job.

So this is a silver bullet, is it?

No. It’s worked well for me, but it’s not foolproof. Some projects I’ve worked on have suffered from a form of bug tracker malaise, where the backlog is so great that it’s easy to ignore the vast number of open issues—some of which are no longer relevant— meaning that adding another straw to the haystack isn’t going to help anyone. That’s an extreme position for a project to get into, it’s basically a slippery-slope version of Uncle Bob’s “irresponsible tolerance” where even problems being reported by customers can be tolerated. In those cases, a special injection of enthusiasm into the development team is required: the whole product is already on a death march.

For most projects, though, reporting an issue is a good way to avoid ignoring that issue.

On standards in free software engineering

I have previously written on the economics of software insecurity, and I quote a couple of paragraphs from that post below:

One option that is not fully explored in the book, but which I believe could be worth exploring, is this: development of critical infrastructure software could be taken away from the free market.

Now the size of even the U.S. government IT budget probably isn’t sufficient to completely fund a bunch of infrastructure developers, but there are other options. Rice correctly notes the existence of not-for-profit software development organisations (particularly the Open Source Initiative and Free Software Foundation), and discusses the benefits and drawbacks of the open source model as it applies to commercial software. He does not explore the possibility that charity development organisations could withdraw from market competition, and focus on engineering practice, quality and security without feature parity or first-to-market speed.

Today I was re-reading Free Software, Free Society by Richard Stallman, a collection of his essays and speeches on topics including copyleft, the GNU system and General Public Licence. In thinking about this book, I went wandering back to the idea of non-commercial driver for good quality software.

I am now convinced that the Free Software Foundation should be investigating, researching and promoting standards, practices and quality in software construction.

The principal immediate benefit the FSF would gain is in terms of visibility and support. Everywhere that software is used—public, private and academic sectors—organisations are interested in finding out ways to improve quality, reliability, deliverability: in other words, the success of their software. An entity that could offer to evaluate and report on whether particular techniques are feasible and offer improvement—in return for funding and staffing the production of their sought-after free software—would be welcomed and would be put to good use.

The FSF is well-placed to achieve this goal, because all of its output is copyleft. A large problem with analysing the success or otherwise of development practices is that the outputs are proprietary: not just the code, but the project documentation, meeting minutes and so forth. With an FSF project everything is (or should be) freely available so inspecting how a project was run, what the developers did and—crucially—whether users are happy with the end result is much easier. Conclusions should be reproducible because everybody can see everything that went on.

Notice that in this scheme, relationships between the FSF and other (proprietary, open source, whatever) organisations are mutually beneficial, not antagonistic as is often either actually the case or just assumed. The benefits seen by external parties are the improvements in process and technique; benefits that all developers can make use of. The discussion moves away from free vs. fettered, and becomes making the field of software engineering better for everyone.

Incidentally, such a focus would also put free software at the forefront of discussions on software quality and deliverability. This would be something of a coup for free software, which is often associated with chaotic management, lack of road maps, and paucity of documentation and support. OK, the FSF wants people to consider freedom as a value in itself, but there’s nothing wrong with ensuring that free software is the best software too, surely?

On the economics of software insecurity

This post is mainly motivated by having read Geekonomics: the real cost of insecure software, by David Rice. Since writing the book Rice has apparently been hired by Apple, though his bio at the Geekonomics site doesn’t mention that (nor his LinkedIn profile).

Geekonomics is a thoroughly interesting read. It’s evidently designed as a call to arms for users to demand better security, and as a result resorts to hyperbole in parts. You are a crash test dummy for software manufacturers and are paying extravagantly for the privilege. In this way it reads as if it is to security as The Inmates are Running the Asylum was to user experience in the 1999: do you realise just how shoddy all of this software you use is?

That said, once you actually dig into Rice’s arguments, the hyperbole disappears and the book becomes well-sourced, internally consistent and rational. He explains why the market forces in the software industry don’t lead to security (or even high quality) as either the primary customer requirement nor the key focus of producers.

Interestingly, while Geekonomics only incidentally touches on the role of security researchers in the software economy, their position is roughly consistent with the one I outlined in On Securing Lion: they are in it to get money (and sometimes fame) from selling either the vunerabilities they discover, or their skill at selling vulnerabilities.

The book ends by describing the different options a curated free market like the US market has for correcting the situation where market forces lead to socially undesirable outcomes: these options are redress via contract law, via tort law or via strict liability legislation. The impact on each of the above on the software industry is estimated.

One option that is not fully explored in the book, but which I believe could be worth exploring, is this: development of critical infrastructure software could be taken away from the free market.

Now the size of even the U.S. government IT budget probably isn’t sufficient to completely fund a bunch of infrastructure developers, but there are other options. Rice correctly notes the existence of not-for-profit software development organisations (particularly the Open Source Initiative and Free Software Foundation), and discusses the benefits and drawbacks of the open source model as it applies to commercial software. He does not explore the possibility that charity development organisations could withdraw from market competition, and focus on engineering practice, quality and security without feature parity or first-to-market speed.

Governments, trade groups, communications carriers and other organisations with an interest in using software as infrastructure (e.g. so-called “cloud” companies) could fund non-profits (maybe with money, maybe with staff) that develop infrastructure-grade software. Those non-profits would have a mission to do quality-centric development, and would put confidentiality, integrity, availability, reliability and correctness before feature richness or novelty. Their governance (the bit I haven’t fully thought through, admittedly) would be organised to promote and reward exactly that approach to development.

The software, its documentation and its engineering methodologies would be open, so that commercial software can take advantage of its advances at low cost. This is partially of importance because where security is a “hygiene factor” to software purchasers, the “security gap” between the infrastructure-grade and commercial-grade software would become clear and would artificially introduce infrastructure-grade robustness to the marketplace. Commercial vendors who could cheaply pick up parts of the infrastructure-grade software for their own products would be, in a self-interested manner, bringing that software’s quality into the commercial marketplace and making it a competition point.

“But,” some people say, “such software would be feature-poor. Why would anyone choose [SafeOS, SafeWebServer, SafeSmartPhone, whatever] over a feature-rich commercial offering?” The point is, that in infrastructure, correct function is more important than features. It’s only the fact that software exists purely in a competitive world that means the focus is on features.

Case in point: one analogy used throughout Geekonomics is that infrastructure software is like cement (actually, in a book I’m currently writing on software testing, I make the same analogy, though relating to design rather than function). Well even taking into account innovations like Portland cement, the feature list of cement hasn’t changed in thousands of years. It sticks aggregate together to make concrete or mortar. It’s only the quality of its stick-aggregate-together-ness that has changed.

In relation to software, most computers are still “stuck together” using RFC791 (Internet Protocol version 4), which was documented in 1981 but was in use already at that time. The main advantages of RFC2460 (Internet Protocol version 6), written in 1998, are increased address space and reduced overhead. It’s better at stick-computers-together-ness, but doesn’t really do anything new. There may have been new applications of networks recently (and of course, the late addition of confidentiality in the mid-1990s), but networking itself doesn’t frequently need new features.

Or even operating system software. The last version of Mac OS X that added any features for its users was version 10.5, said new features were:

  • Time Machine: computers have been doing backup for years, this added a new UI.
  • Spaces: an improvement on the ability to draw windows on the screen.
  • Back to my Mac: an integration of existing capabilities (VNC and wide-area zeroconf networking).
  • Boot Camp: managing partitions, and giving the primary bootloader compatibility with a 1983 computer standard.

All of the other enhancements were in the applications, which still all require the same things of the OS: schedule processes, protect memory, abstract the file systems, manage devices. Again, there may have been new applications of an operating system, but there hasn’t been much newness in the operating system itself.

The part where such non-profit infrastructure software becomes tricky is in integrating with the rest of the “stack”. On the hardware side, it would be inappropriate to require that a government-sponsored and not-for-profit software project run on proprietary hardware. On the other hand, it might be inappropriate to disbar deployment on proprietary hardware—but is infrastructure-grade software on commercial-grade hardware still an infrastructure-grade deployment?

That’s particularly difficult in our world—the world of smartphones—because there isn’t really any open hardware. There are somewhat open definitions: the Android Compatibility Definition Document for example. But as Ken Thompson taught us: in a trusted system, we need to question who we trust and why.

Going the other way, of course, is much easier. Anyone could write an application that interoperates with infrastructure-grade software, or a system partially constructed out of such software. But the same question would still exist: how much of an impact do the non-infrastructure-grade components have on the reliability of the system?