On squeezing out that last ounce of performance

As I get confused by a component of an application that should be network-bound actually being limited by CPU availability, I get reminded of the times in my career that I’ve dealt with application performance.

I used to work on a platform for distributing MMS and SMS messages, written using GNUstep, Linux and PostgreSQL. I had a duplicate of the production hardware stack sitting in a data centre in our office, which ran its own copy of the production software and even had a copy of the production data. I could use this to run simulations or even replays of real events, finding the locations of the slowdowns and trying various hypotheses to remove the bottlenecks.

My next job was working on antivirus. The world of antivirus evaluations is dominated by independent testers, who produce huge bake-off articles comparing the various products. For at least one decade the business of actually detecting the stuff has been routine, so awards like VB100 are meaningless. In addition to detection stats, analysts like VB and AV-comparative measure resource consumption, and readers take those measurements seriously.

That’s because they don’t want to use anti-virus, and they didn’t pay for their RAM and CPUs to waste it on software they don’t want to use. So given a bunch of apps that all do the same thing, they’ll look at which does it with less impact on everything else. This means that performance is an important requirement of new projects in AV software: on the product I worked on we had a defined set of performance tests. A new project release could not— regardless of how shiny the new features were—ship if the tests took 5% or more time or RAM than the current shipping version. On like hardware. What that really means is that due to developments in hardware, AV software was getting monotonically faster up until a few years ago.

Since then, my relationship with performance optimisation has been more sporadic. I’ve worked on contracts to speed up iOS apps, and even almost took a performance analysis and improvement job on a mobile phone operating system team. But what I usually do is make software work (and make it secure), with making it work in such a way that people can actually use it being a part of that. Here, then, are my Reflections On Making Efficient Softwareâ„¢.

Start at the beginning.

You may have heard the joke about a man on a driving holiday who gets lost and asks a local for directions. The local thinks for a bit, and says “well to get to where you’re going, I wouldn’t start from here if I were you”.

Performance analysis can be like that, too. If you build up all of the functionality first, and optimise it later, you will almost certainly not get a well-performing product. Furthermore, fixing it will be very expensive. You may be able to squeeze a few kilobytes out here, or get a couple of percent speed increase there, but basically the resources used by your app will not change much.

The reason is simple: most of the performance characteristics of your app are baked into the top-level architecture. You need to start thinking about how your app will perform when you start to design what the various parts do and how they fit together. If you don’t design out the architectural bottlenecks, you’ll be stuck with them no matter how good your implementation is.

I’ve been involved with projects like this. It gets to near the ship date, and the app works but is useless because it consumes all of the RAM and/or takes too long to do any work. The project manager gets a developer in (hello!) to address the performance issues. After a few weeks, the developer has managed to understand the code, do some analysis, and improved things by a couple of percent. It’s gone from “sucky” to “quite sucky”, and the ship date isn’t any further away.

An example: If you build a component that can process, say, ten objects per second, then hands them on to another component that can display results at 100 objects per second, you’re always limited to 10 Hz give or take. You might get 12, you’ll never get 100 without replacing the first component or the whole app. Both of these options are more expensive after you’ve written the component than before. Which leads me on to the next top tip…

Simulate, simulate, simulate

So how are you supposed to know how fast your putative architecture is going to run on paper? That’s easy: simulate each component. If you believe that you’ll usually get data from the network at, say, 100 objects/sec, then write a driver that sends fake objects at about 100 Hz. If you think that might spike at 10,000 objects/sec, then simulate that spike too. You’ll be able to see what it takes to develop an app that can respond to those demands.

What’s more, you’ll be able to drop your real components into the simulated environment, and see how they really handle the situations you cook up. You can even use these harnesses as an integration test framework at the intermediate level (i.e. larger than classes, smaller than the whole app).

Your simulated components should use the same interfaces to the filesystem and each other that the real code will use, and the same frameworks or libraries. But they shouldn’t do any real work. E.g if you’ve got a component that should read a JSON stream from the network, break it into objects, do around 10ms of work on each object and post a notification after each one is finished, you can write a simulation to do that using the JSON library you plan on deploying, the sleep() function and NSNotificationCenter. Then you can play around with its innards, trying out operation queues, dispatch queues, caching and other techniques to see how the system responds.

Performance isn’t all about threads

Yes, Apple has a good concurrency programming guide. Yes, dispatch queues are new and cool. But no, not everything is sped up by addition of threads. I’m not going to do the usual meaningless micrometrics of adding ten million objects to a set, because no-one ever does that in real code.

The point is that doing stuff in the background is great for exactly one thing: getting that stuff off of the UI thread. For any other putative benefits, you need to measure. Perhaps threading will speed it up. Perhaps there aren’t any good concurrent algorithms. Perhaps the scheduling overhead will get in the way.

And of course you need to measure performance in an approximation to the customer’s environment. Your 12-core Mac Pro probably runs your multithreaded code in a different way than your user’s MacBook Air (especially if the Pro has a spinning disk). And the iPad is nothing like the simulator, of course.

Speed, memory, work: choose any two

You can make it faster and use less RAM by not doing as much. You can make it faster and do the same amount of work by caching. You can use less RAM and do the same amount of work by reducing the working set. Only very broken code can gain advances in speed and memory use while not changing the outcome.

Measure early and measure often

As I said at the top, there’s no point baking the app and then sprinkling with performance fairy dust at the end. You need to know what you’re aiming for, whether it’s achievable, and how it’s going throughout development.

You must have some idea of what constitutes acceptable performance, so devise tests that discover whether the app is meeting those performance requirements. Run these tests periodically throughout development. If you find that a recent change slowed things down, or caused too much memory to be used, now is a good time to fix that. This is where that simulation comes in useful, so you can get an idea about the system’s overall performance even before you’ve written it all.

On phone support scams and fake AV

A couple of weeks ago, I posted on Twitter about a new scam:

Heard about someone who was phoned by a man “from Windows” who engineered his way into remote access to the mark’s computer.

Fast forward to now, the same story has finally been picked up by the security vendors and the mainstream media. This means it’s probably time to go into more depth.

I heard a first-hand account of the scam. The victim is the kind of person who shouldn’t be expected to understand IT security – a long distance lorry driver who uses his computer for browsing, e-mail, and that sort of thing. As he tells it, the person called, saying they were from Windows and that they had discovered his computer was infected. He was given instructions to give the caller remote access to help clean up the computer.

With remote access, the caller was able to describe some of the problems the victim was having with his computer, while taking control to “fix them”. The caller eventually discovered that the victim’s anti-virus was out of date, and that if he gave the caller his payment information he could get new software for £109. This is when the victim hung up; however his computer has not booted properly since then.

I think my audience here is probably tech-savvy enough not to need warning about scams like this, and to understand that the real damage was done even before any discussion of payments was made (hint: browser form-auto-fill data). It’s not the scam itself I want to focus on, but our reaction.

Some people I have told this story to in real life (it does happen) have rolled their eyes, and said something along the lines of “well of course the users are the weakest link” in a knowing way. If that’s true, why rely on the users to make all the security decisions? Why leave it to them to decide what’s legitimate and what’s scammy, as was the case here? Why is the solution to any problem to shovel another bucketload of computer knowledge on them and hope that it sticks, as Sophos and the BBC have tried in the articles above?

No. This is not a solution to anything. No matter how loudly you shout about how that isn’t how Microsoft does business, someone who says he is from Microsoft will phone your users up and tell them that it is.

This is the same problem facing anti-virus vendors trying to convince us not to get fooled by FakeAV scams. Vendor A tells us to buy their product instead of Vendor B’s, because it’s better. So, is Vendor A the FakeAV pedlar, or B? Or is it both? Or neither? You can’t tell.

It may seem that this is a problem that cannot be solved in technology, that it relies on hard-wired behaviour of us bald apes. I don’t think that’s so. I think that it would be possible to change the way we, legitimate software vendors, interact with our users, and the way they interact with our software, such that an offline scam like this would never come to pass. A full discussion would fill a whole whitepaper that I haven’t written yet. However, to take the most extreme point from it, the one I know you’re going to loathe, what if our home computers were managed remotely by the vendors? Do most users really need complete BIOS and kernel level access to their kit? Really?

Look for the whitepaper sometime in the new year.

On free Mac Anti-Virus

On Tuesday, my pals at my old stomping ground Sophos launched their Free home edition Mac product. I’ve been asked by several people what makes it tick, so here’s Mac Anti-Virus In A Nutshell.

Sophos Anti-Virus for Mac

What is the AV doing?

So anti-virus is basically a categorisation technology: you look at a file and decide whether it’s bad. The traditional view people have of an AV engine is that there’s a huge table of file checksums, and the AV product just compares every file it encounters to every checksum and warns you if it finds a match. That’s certainly how it used to work around a decade ago, but even low-end products like ClamAV don’t genuinely work this way any more.

Modern Anti-Virus starts its work by classifying the file it’s looking at. This basically means deciding what type of file it is: a Mac executable, a Word document, a ZIP etc. Some of these are actually containers for other file types: a ZIP obviously contains other files, but a Word document contains sections with macros in which might be interesting. A Mac fat file contains one or more executable files, which each contains various data and program segments. Even a text file might actually contain a shell script (which could contain a perl script as a here doc), and so on. But eventually the engine will have classified zero or more parts of the file that it wants to inspect.

Because the engine now knows the type of the data it’s looking at, it can be clever about what tests it applies. So the engine contains a whole barrage of different tests, but still runs very quickly because it knows when any test is necessary. For example, most AV products now including Sophos’ can actually run x86 code in an emulator or sandbox, to see whether it would try to do something naughty. But it doesn’t bother trying to do that to a JPEG.

That sounds slow.

And the figures seem to bear that out: running a scan via the GUI can take hours, or even a day. A large part of this is due to limitations on the hard drive’s throughput, exacerbated by the fact that there’s no way to ask a disk to come up with a file access strategy that minimises seek time (time that’s effectively wasted while the disk moves its heads and platters to the place where the file is stored). Such a thing would mean reading the whole drive catalogue (its table of contents), and thinking for a while about the best order to read all of the files. Besides, such strategies fall apart when one of the other applications needs to open a file, because the hard drive has to jump away and get that one. So as this approach can’t work, the OS doesn’t support it.

On a Mac with a solid state drive, you actually can get to the point where CPU availability, rather than storage throughput, is the limiting factor. But surely even solid state drives are far too slow compared with CPUs, and the Anti-Virus app must be quite inefficient to be CPU-limited? Not so. Of course, there is some work that Sophos Anti-Virus must be doing in order to get worthwhile results, so I can’t say that it uses no CPU at all. But having dealt with the problem of hard drive seeking, we now meet the UBC.

The Unified Buffer Cache is a place in memory where the kernel holds the content of recently accessed files. As new files are read, the kernel throws away the contents of old files and stores the new one in the cache. Poor kernel. It couldn’t possibly know that this scanner is just going to do some tests on the file then never look at it again, so it goes to a lot of effort swapping contents around in its cache that will never get used. This is where a lot of the time ends up.

On not wasting all that time

This is where the on-access scanner comes in. If you look at the Sopohs installation, you’ll see an application at /Library/Sophos Anti-Virus/InterCheck.app – this is a small UNIX tool that includes a kernel extension to intercept file requests and test the target files. If it finds an infected file, it stops the operating system from opening it.

Sophos reporting a threat.

To find out how to this interception, you can do worse than look at Professional Cocoa Application Security, where I talk about the KAUTH (Kernel AUTHorisation) mechanism in Chapter 11. But the main point is that this approach – checking files when you ask for them – is actually more efficient than doing the whole scan. For a start, you’re only looking at files that are going to be needed anyway, so you’re not asking the hard drive to go out of its way and prepare loads of content that isn’t otherwise being used. InterCheck can also be clever about what it does, for example there’s no need to scan the same file twice if it hasn’t changed in the meantime.

OK, so it’s not a resource hog. But I still don’t need anti-virus.

Not true. This can best be described as anecdotal, but all of the people who reported to me that they had run a scan since the free Sophos product had become available, around 75% reported that it had detected threats. These were mainly Windows executables attached to mail, but it’s still good to detect and destroy those so they don’t get onto your Boot Camp partition or somebody else’s PC.

There definitely is a small, but growing, pile of malware that really does target Macs. I was the tech reviewer for Enterprise Mac Security, for the chapter on malware my research turned up tens of different strains: mainly Trojan horses (as on Windows), some OpenOffice macros, and some web-based threats. And that was printed well before Koobface was ported to the Mac.

Alright, it’s free, I’ll give it a go. Wait, why is it free?

Well here I have to turn to speculation. If your reaction to my first paragraph was “hang on, who is Sophos?”, then you’re not alone. Sophos is still a company that only sells to other businesses, and that means that the inhabitants of the Clapham Omnibus typically haven’t heard of them. Windows users have usually heard of Symantec via their Norton brand, McAfee and even smaller outfits like Kaspersky, so those are names that come up in the board room.

That explains why they might release a free product, but not this one. Well, now you have to think about what makes AV vendors different from one another, and really the answer is “not much”. They all sell pretty much the same thing, occasionally one of them comes up with a new feature but that gap usually closes quite quickly.

Cross-platform support is one area that’s still open, surprisingly. Despite the fact that loads of the vendors (and I do mean loads: Symantec, McAfee, Trend Micro, Sophos, Kaspersky, F-Secure, Panda and Eset all spring to mind readily) support the Mac and some other UNIX platforms, most of these are just checkbox products that exist to prop up their feature matrix. My suspicion is that by raising the profile of their Mac offering Sophos hopes to become the cross-platform security vendor. And that makes giving their Mac product away for free more valuable than selling it.