When security procedures go bad

My password with my bank may as well be “I can’t remember, can we go through the security questions please?” That’s my answer so many times when they ask, and every time it gets me in via a slightly tedious additional verification step. Losing customers probably represents a greater financial risk to them than fraud on any individual account, so they don’t seem to take the password thing too seriously.

“You could simply do X” costs more

Someone always says it. “Could you just add this?” or “I don’t think it would be too hard to…” or if somebody else “changes these two simple things”, someone might create a completely bug-compatible, scale-compatible implementation of this other, undocumented service…wait, what?

Many of us are naturally optimistic people. We believe that the problems that befall others, or that we’ve experienced before, will not happen this time. That despite the last project suffering from “the code getting messier and messier”, we’ll do it right this time.

Optimism’s great. It tricks us into trying to solve difficult problems. It convinces us that the solution is “just around the corner”, so we should persevere. The problems start to arise when we realise that everyone else is optimistic, too—and that optimism is contagious. If you’re asked to give a drive-by estimate on how hard something is, or how long it takes, you’ll give an answer that probably doesn’t take into account all the problems that might arise. But now two of you believe in this optimistic estimate: after all, you’re a smart person, you’re trusted to give good estimates.

We need to be careful when talking to people who aren’t developers to make it clear that there’s no such thing as “simply” in most software systems. That “simply” adding a field brings with it all sorts of baggage: placing the field in an aesthetically pleasing fashion across multiple localised user interfaces, localising the field, building the user experience of interacting with the field and so on. That using the value from the field could turn it from a complicated problem into a complex problem, particularly if the field is just selecting between multiple implementations of what may even be multiple interfaces. That just adding this field brings not only work, but additional risk. That these are just the problems we could think of up front; there are often more that get uncovered as we begin to shave the yak.

But clearly we also need to bear in mind the problems we’ve faced and continue to face when talking to each other. We should remember that the last thing we tried to simply do ended up chasing a rabbit down a hole. If I don’t think that I can “simply” do something without unexpected complexity and risk, I should not expect that others can “simply” do it either.

How to version a Mach-O library

Yes, it’s the next instalment of “cross-platform programming for people who don’t use Macs very much”. You want to give your dynamic library a version number, probably of the format major.minor.patchlevel. Regardless of marketing concerns, this helps with dependency management if you choose a version convention such that binary-compatible revisions of the libraries can be easily discovered. What could possibly go wrong?

The linker will treat your version number in the following way (from the APSL-licensed ld64/ld/Options.cpp) if you’re building a 32-bit library:

//
// Parses number of form X[.Y[.Z]] into a uint32_t where the nibbles are xxxx.yy.zz
//
uint32_t Options::parseVersionNumber32(const char* versionString)
{
	uint32_t x = 0;
	uint32_t y = 0;
	uint32_t z = 0;
	char* end;
	x = strtoul(versionString, &end, 10);
	if ( *end == '.' ) {
		y = strtoul(&end[1], &end, 10);
		if ( *end == '.' ) {
			z = strtoul(&end[1], &end, 10);
		}
	}
	if ( (*end != '\0') || (x > 0xffff) || (y > 0xff) || (z > 0xff) )
		throwf("malformed 32-bit x.y.z version number: %s", versionString);

	return (x << 16) | ( y << 8 ) | z;
}

and like this if you’re building a 64-bit library (I’ve corrected an obvious typo in the comment here):

//
// Parses number of form A[.B[.C[.D[.E]]]] into a uint64_t where the bits are a24.b10.c10.d10.e10
//
uint64_t Options::parseVersionNumber64(const char* versionString)
{
	uint64_t a = 0;
	uint64_t b = 0;
	uint64_t c = 0;
	uint64_t d = 0;
	uint64_t e = 0;
	char* end;
	a = strtoul(versionString, &end, 10);
	if ( *end == '.' ) {
		b = strtoul(&end[1], &end, 10);
		if ( *end == '.' ) {
			c = strtoul(&end[1], &end, 10);
			if ( *end == '.' ) {
				d = strtoul(&end[1], &end, 10);
				if ( *end == '.' ) {
					e = strtoul(&end[1], &end, 10);
				}
			}
		}
	}
	if ( (*end != '\0') || (a > 0xFFFFFF) || (b > 0x3FF) || (c > 0x3FF) || (d > 0x3FF)  || (e > 0x3FF) )
		throwf("malformed 64-bit a.b.c.d.e version number: %s", versionString);

	return (a << 40) | ( b << 30 ) | ( c << 20 ) | ( d << 10 ) | e;
}

The specific choice of bit widths in both variants is weird (why would you have more major versions than patchlevel versions?) and the move from 32-bit to 64-bit makes no sense to me at all. Nonetheless, there’s a general rule:

Don’t use your SCM revision number in your version numbering scheme.

The rule of thumb is that the major version can always be less than 65536, the minor versions can always be less than 256 and you can always have up to two minor version numbers. Trying to supply a version number that doesn’t fit in the bitfields defined here will be a linker error, and you will not go into (address) space today.

Anyone Can Write A Manifesto And You Can Too!™

Over a small number of years, I have helped to write some software. During this time I have come to value:

That is, while the things on the right are sometimes the means, the thing on the left is always the end.

Detecting overflows, undefined behaviour and other nasties

You will remember that a previous post discussed what happens when you add one to an integer, and that the answer isn’t always obvious. Indeed, the answer isn’t always defined.

As it happens, there are plenty of weird cases that crop up when working with C and languages like it. You can’t expect a boolean to be YES or NO. You can’t assume that an enum variable only holds values from the enumeration. You can’t assume that you know how long an array is, even if the caller told you. Just as adding one is subtle, so is dividing by minus one.

In each of these cases[*]—and others—what you should actually do is to check that the input to an operation won’t cause a problem before doing the operation:

int safe_add(int n1, int n2) {
  if(n2 > 0) assert(INT_MAX - n2 < n1); //or throw a floating point exception or otherwise signal not to use the result
  if(n2 < 0) assert(INT_MIN - n2 > n1);
  return n1 + n2;
}

But who does that? Thankfully, the compiler writers do. Coming up in a future release of clang is a collection of sanitisers that insert runtime checks for the things described above. If you’re the kind of person who writes assertions like the above in your code, you can swap all that for sanitisers enabled in your debug builds. If you’re not the kind of person who writes those assertions, you probably should enable these sanitisers, then go and find out where else you should be adding assertions.

In code that deals with input from other processes, machines or the outside world, you could consider enabling sanitisers even in release builds. They’ll cause your app to report where it encounters overflows, underflows, and other potential security problems. If you don’t think it’s a good enough better option, you should be writing explicit checks for bad data and application-specific failure behaviour.

So, how does this work? Compiling with the sanitiser options inserts checks of the sort shown above into the compiled code. These checks are evaluated at runtime (sort of; for array bounds checking, the size of the array must be known when compiling but the check is still done at runtime) and the process prints a helpful message if the checked condition fails. Let’s look at an example!

#include <stdio.h>
#include <limits.h>

int main(int argc, char *argv[]) {
	printf("%d + %d = %d\n", INT_MAX, 1, INT_MAX + 1);
	return 0;	
}

Compiling that with default settings “works”, but results in undefined behaviour:

clang -o mathfail mathfail.c
./mathfail
2147483647 + 1 = -2147483648

So let’s try to insert some sanity!

clang -fsanitize=integer -o mathfail mathfail.c
./mathfail
./mathfail.c:5:47: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
2147483647 + 1 = -2147483648

Another example: dividing by zero.

#include <stdio.h>

int main(int argc, char *argv[]) {
	int x=2, y=0;
	printf("x/y = %d\n", x/y);
	return 0;
}

./mathfail.c:5:24: runtime error: division by zero

I wonder how many of the programs I’ve written in the past would trigger sanity failures with these new tools. I wonder how many of those are still in use.

[*] With the exception of booleans. As Mark explains in the linked post, you can always compare a boolean to 0, which is the only value that means false.

An open letter to Xcode

The post below has been filed verbatim as an Apple Developer Tools bug report with ID 13051064.

Dear Xcode,

imagine that you had a combine harvester. Only, this combine harvester, instead of having a hopper into which the winnowed wheat is poured, has a big quern stone. It grinds the wheat into a flour, which is poured into a big mixer with some water and a bit of yeast. The mixture from here then finds its way into a big oven.

While all of this is going on, the other side of the combine harvester is actually a platform hosting some cows and a milking machine. The gathered milk is churned by the same device that turns the quern for grinding the flour.

As you have no doubt concluded yourself, Xcode, such a machine would be a great help in producing bread and butter. But let me tell you to what this parable alludes: Apple’s bread and butter is its systems of electronic devices, first and third party software. You, Xcode, are the souped-up combine harvester; the thing that makes it possible to rapidly turn the ingredients of Apple’s ecosystem into the products that everyone desires. You free both Apple’s people and the people who make things with Apple’s things from the drudgery of threshing and winnowing, and let them concentrate on what they want to make and how people would want to interact with those things. Do some people want artisanal loaves while others want simple pre-sliced packages? Xcode, you help people help both of these people.

Here’s what I think. I think you know this, Xcode. I think you, and the product people, and the developers who make you, all know that you are a key piece of machinery in producing Apple’s bread and butter. I worry though that other people at Apple, particularly some of the managers, do not see this. I think that they see you as a tool for internal use and a small number of external customers; a group that is not the primary focus for Apple’s products. These same people would see the combine harvester not as the greatest labour-saving device of the twentieth century, but as a niche instrument only of interest to combine harvester operators.

I respect you, Xcode. You may not know that, because I joke about you a lot on Twitter. What you have to understand is that for an English man to make a joke about one of his friends, it means that he really respects and has affection for that friend. It’s that respect I have for you, and for the people that make you, that means I think I can tell you what follows and you’ll take it in the intended spirit: as advice, not as an insult.

Xcode, I need you to do a couple of things. One of them you’ll like, one of them you won’t. I’ll start with the one you’ll like: I’m not a fan of the shit sandwich mode of delivering criticism. So here goes. Xcode, I need you and the people who make you to go around the campus at Apple and tell everyone you meet about the combine harvester story. Particularly senior management. Let everyone see that you are not some toy app used by a few edge-case and highly demanding users, but are in fact a critical component in the machinery that makes iPhones what they are, that makes iPads what they are, that makes Macs what they are. Remind the people who’re focused on iBooks Author as a key part of Apple’s education strategy that you help to make iBooks Author. Remind the people who want to build even better iOS releases that you are helping to build those releases. That when someone says “there’s an app for that”, it’s because of you. Help everyone to realise that the better you are, Xcode, the better nearly everything else that Apple does will become.

You’ll find it easier to convince some people than others, I’m sure. I expect that Craig Federighi has strong memories of using you when you were still Project Builder. I expect that Tim Cook may never have launched or even installed you. I don’t promise that telling the combine harvester story will be easy, but I do promise that it will boost your esteem, and that of the people who work on you.

So that was the one I think you’ll like, Xcode. Here is the other one; the one I doubt you’ll like so much. I mentioned Project Builder in the last paragraph: Xcode, I’m sorry to have to tell you that you are no longer Project Builder. How do I mean? I’m certainly not talking about your outward appearance: Project Builder was a quirky adolescent who couldn’t do anywhere near as much as you can. What I’m saying is that you’re no longer the cool startup whose goal is to change the world of developer tools. We’ve tried a whole bunch of different machinery and we’ve settled on the souped-up combine harvester. What we need you to be is a better souped-up combine harvester.

That’s not to say that innovation in developer tools should die completely. There will be new Project Builders. Someone will invent a new way of building software that’s completely out of left field, and plenty of people will find that new way better than the current way. That’ll be really cool. Maybe Apple will buy that company, or license their technology, so that you can have a go with it. That would also be cool.

What I’m saying, Xcode, is that you’re mature and grown up and people respect you for that. Please stop having these mid-life crises like you did at version 4 where you suddenly change how everything is done. Your work now is in incremental enhancement, not in world-changing revolution. People both at Apple and outside have come to expect you to be dependable, reliable and comfortable. You may think that’s boring. It’s not! Remember all of those things that exist because of you, all of those people who are delighted by what you have helped create. Just bear in mind also that when it comes time for Xcode 5, people will want a better Xcode, not a replacement for Xcode.

Think about the apps that are made at Apple now. What could make it a bit faster to make every view? Or make regressions a bit easier to detect and fix? What errors do developers at Apple see, with what frequency? How could you reduce those errors or make them quicker to diagnose? There’s an old story that Steve Jobs wanted the boot time of the Macintosh to be as fast as possible, and he thought about it in terms of the number of lifetimes that would be wasted staring at the boot screen. You may now be thinking about the number of lifetimes spent writing code, but I want you to think bigger than that: think about the exponentially larger number of lifetimes being spent waiting for those apps to ship. That extra month where 100 million people waited for the new iTunes; could a better Xcode have cut that time down?

Listen, Xcode, this is going to sound weird. I mean, you barely know me, but I’m talking like we’re best friends and I’m holding some kind of intervention. But here’s how I want you to see it, and it’s based on the combine harvester story. I don’t know whether you have a bonus or incentive scheme at Apple, but if you do then ask them to make this change. Xcode, your bonus should not be based on shipping Xcode. That would be like paying a combine harvester for harvesting; it completely misses the point. The point of harvesting is to make things like bread. Your bonus should be based on shipping every other software product Apple makes. Maybe even the third-party apps, if you can work out a fair way to measure that.

With more sincerity than this blog usually evinces,

Graham.

How big is an integer?

In the beginning, when all was without form and void, Kernighan and Ritchie created char. And they said, “let it be of a size chosen by the compiler, guaranteed to be large enough to hold one character from the execution character set.” And so it was, and they decreed that whatever the size of this char, the compiler would call its size 1.

Right, that’s enough silly voice. There were also other types of integer: short, int, long, long long, and pointers. The point is that on any system, you could find out how big one of these numbers is (using the compiler’s sizeof() feature) but that size depended on the system you were compiling for. Assuming that a sizeof(char)==1 is OK, but assuming that sizeof(int)==4 will lead to trouble: it’s 2 on some systems and 8 on some others, for example.

C also provides the typedef feature, which lets you give new names to existing types. Plenty of API designers use typedef to rename integer types to give some clue as to their meaning; so you’ll see size_t used to describe the size of something, ptrdiff_t to express the difference between two pointers, and so on.

Leaving the size of the various types undefined gives plenty of flexibility to implementors. A compiler for a given platform can choose to create ints that are the same size as a CPU register, or the maximum size transferable on the data bus in one load operation. It gives benefits to well-written software, which can be ported to hardware with different data characteristics just by recompiling. It also causes some problems for programmers whose software needs to talk to, well, anything else.

Imagine two computers communicating over a network. One of them wants to send an integer to the other, and the program represents the integer as an int. Well the receiving computer could have a different idea of how big an int is. Maybe the sender puts four bytes onto the network, but the receiver waits forever because it wants eight bytes. Alternatively, maybe the sender delivers eight bytes, the first four of which are incorrectly used as the integer, and the next four remain in the queue to be incorrectly used for the next value required.

The same problem can occur with files, too. Imagine that my app writes an int to disk. My customer then upgrades their computer, and my same app running on a different architecture tries to read the int back in. Does it get the same value? I’ve even seen this problem with two processes on the same computer, where one was a 64-bit kernel talking to a 32-bit user process. [N.B. a related problem is that every process needs to agree on which byte goes where in multi-byte integers; a problem not considered here].

Clearly there’s a need for integer types that are of a stable size, guaranteed to remain the same whatever architecture the software is running on. The inttypes.h or stdint.h headers, introduced in C99 (so well over a decade ago), provide these (and more). If the target environment is capable of providing an integer type that uses exactly eight bits, you can access that as int8_t (uint8_t for unsigned integers). Whether or not this is available, the smallest type that holds at least eight bits is called int_least8_t. The integer type that holds at least eight bits and is fastest for the computer to handle can also be used, as int_fast8_t. Standard implementations should provide these types for 8, 16, 32 or 64 bit integers, and may provide types for other sizes too.

The point of all of this is that while there are guaranteed-size integer types available, anything that isn’t obviously of a specific size should be treated as if it’s of unknown size. Take, for example, NSInteger. It and the unsigned NSUInteger type were introduced by Apple to provide source code compatibility between 32 and 64-bit Cocoa API code, while also expanding the values used and returned by the API on wider platforms.

This could have been done by keeping the API as it was, and changing the size of int on 64-bit Cocoa from 4 bytes to 8. This would’ve been a poor choice, because plenty of code that assumes (wrongly) that sizeof(int)==4 would have broken. Most other 64-bit environments provide eight byte longs and pointers and four-byte ints, and Apple chose to follow suit for better compatibility.

Instead, NSInteger’s underlying type depends on the architecture you’re compiling for. Currently, all Apple’s 32-bit platforms define it as int, and the 64-bit platforms define it as long. The end result is that while an NSInteger is guaranteed to be big enough to hold the length of an NSArray or an NSString, it isn’t guaranteed to be the same size as someone else’s NSInteger. Some compatibility issues still remain, and failing to deal with them can lead to some subtle bugs that only manifest themselves in particular situations.

How to excel at IDE design

When people have the “which IDE is best” argument, what they’re actually discussing is “which slightly souped-up monospace text editor with a build button do you like using”. Eclipse, Xcode, IntelliJ, Visual Studio…all of these tools riff on the same design—letting you see the source code and change the source code. As secondary effects you can also do things like build the source code, run the built product, and debug it.

The most successful IDE in the world, I would contend (and then wave my hands unconvincingly when anyone asks for data), is one that’s not designed like that at all. It’s the one that is used by more non-software specialists than any of those named above. The one that doesn’t require you to practice being an IDE user for years before you get any good. The one that business analysts, office assistants, accountants and project managers alike all turn to when they need their computer to run through some custom algorithm.

The IDE whose name appears in the title of this post.

Now what makes a spreadsheet better as a development environment is difficult to say; I’m unaware of anyone having researched it. But what makes it a different development environment from an IDE can be clearly seen by using each of them.

In a spreadsheet, it’s the inputs and results that are front-and-centre in the presentation, not the intermediate stuff that gets you from one to the other. You can test your “code” by typing in a different input and watching the results change in front of you. You can see intermediate results, not by breaking and stepping through, or putting in a log statement then switching to the log view, but by breaking the algorithm up into smaller steps (or functions or procedures, if you want to call them that). You can then visualise how these intermediate results change right along side the inputs and outputs. That’s quicker feedback than even REPLs can offer.

Many spreadsheet users naturally adopt a “test-first” approach; they create inputs for which they know what the results should be and make successively better attempts to build a formula that achieves these results. And, of course, interesting visualisations like graphs are available (though the quality does vary between products). Drawing a graph in Xcode is…challenging (and yes, nerds, I know about CorePlot).

All of this is not to say that spreadsheets are the future of IDEs. In many ways spreadsheets are more accessible as IDEs, and it’d be good to study and learn from the differences and incorporate some of them into the things that are the future of IDEs.

Supporting both ARC and MRC build settings

Let’s face it, people don’t read `README`s. If you write library code that people are going to use in their own projects, you can’t rely on that bit at the bottom of the documentation that tells people to do -fobjc-arc on your files that they drop into your project. You can rely on all the issues that get reported about memory leaks :-).

The actual solution

Your project should build a library (static by necessity on iPhone, there are other options on the Mac) so developers can just add that one library target as a build dependency, and drop the headers into their own projects.

The result is that now your memory management is hidden behind the object boundary and the naming conventions of your methods. You should probably still be using manual reference counting if you want people who’ve already written apps to be able to link against your code without problems, because there are still apps out there that target versions of iOS that can’t link ARCified objects. Regardless, whether an app is ARCified or not it will be able to link your library.

The other solution

Sometimes you find code that developers are supposed to integrate by dropping the source files into their targets. This is worse than providing a static library: now you’ve made the developer care about the internals of your code – the compiler flags you need to set become something they have to deal with in their target’s build settings. This includes the setting for whether automatic reference counting is enabled.

…unless you support both possibilities. I’ve used the macros defined below to use the same code with both automatic and manual reference counting compiler settings. This code included Core Foundation bridged objects, so this isn’t just “the trivial case” (whatever that is).

#if __has_feature(objc_arc)
# define FZARelease(obj)
# define FZAAutorelease(obj) (obj)
# define FZARetain(obj) (obj)
#else
# define FZARelease(obj) [(obj) release]
# define FZAAutorelease(obj) [(obj) autorelease]
# define FZARetain(obj) [(obj) retain]
#endif

Objective-C garbage collection

I haven’t had a need to test how that interacts with garbage collection, or build code that works in all three environments. However, if you already wrote your code to support (rather than require) GC, and you don’t rely on CFMakeCollectable, this collection of macros at least won’t make anything worse.

Automate all the server Objective-C!

I decided it was time to stop writing WebObjects/GNUstepWeb code, and write some code that would make it easier to write WO/GSW code. With that in mind I replaced my previous component generator with a more robust generator.

I also wrote and published some git hooks for working on these projects. The pre-commit hook just runs ‘make’ and doesn’t let you commit if you can’t build: we’ll look at testing in a later post (you may not know this, but I’ve done a thing or two with Objective-C unit tests).

The post-commit hook launches the direct connect app, so you can have a box that’s always running the latest version for testing. You’d want to do something similar, though really not the same, for a production box: as well as being sensitive to database and web server adaptor[*] configurations, you’d want to be stricter about when you restart the server, and may have some app-specific work to do like triggering cleanup code. Besides which, it makes more sense to do that in a post-update hook.

[*] While a GSW app does include a web server, it’s common to restrict access to that server just to the internal network. Externally you have a “normal” web server like Apache or Nginx, with an adaptor that knows how to parse GSW URLs and redirect requests to the correct application.