On NSNull as an anti-pattern

All this talk about type-safe collections may leave you thinking: but what about NSNull? Let’s say you have an array that only accepts objects conforming to MyProtocol. You can’t add +[NSNull null] to it, because it doesn’t implement the protocol. So haven’t I just broken mutable arrays?

Let’s be clear: NSNull is a nasty hack. The original inventors of Foundation wanted to provide a variadic initialiser and factory for collection classes, but rather than doing +[NSArray arrayWithCount: (unsigned)count objects: (id)firstObject, ...] they created +[NSArray arrayWithObjects: (id)firstObject, ...]. That meant they needed a special value to flag the end of the list, and they chose nil. That meant you couldn’t put nil into an array, because such an array could not be constructed using the +arrayWithObjects: style. Therefore they decided to provide a new “nothing” placeholder, and created NSNull.

NSNull makes client code warty. If you were permitted to put nil into a collection, you could just do this:

for (id <MyProtocol>foo in myFoos) {
  [foo doSomethingInteresting];
}

but if you use NSNull, you get to write this (or a close variant):

for (id <MyProtocol>foo in myFoos) {
  if ([foo conformsToProtocol: @protocol(MyProtocol)]) {
    [foo doSomethingInteresting];
  }
}

Nasty. You’ve actually got to perform the test that the protocol conformance is supposed to address, or something that gets you the same outcome.

I’d prefer to use a different pattern, common in languages where nil or its equivalent cannot be messaged, known as the Null Object Pattern. To be clear, all you do is implement a class that conforms to the protocol (or extends the superclass, if that’s what you’re up to) but doesn’t do anything interesting. If it’s interrogated for data, it just returns 0, NO or whatever is relevant. If it’s asked to do work, it just returns. In short, it can be used in the same way as a real instance but does nothing, just as a placeholder ought to behave. So we might do this:

@interface NullMyProtocolConformer: NSObject <MyProtocol> { }
@end

@implementation NullMyProtocolConformer

- (void)doSomethingInteresting { }

@end

Now we can go back to the first version of our loop iteration, and keep our type-conformance tests in our collection classes. Anywhere you might want to put NSNull, you just stuff an instance of NullMyProtocolConformer.

On type safety and making it harder to write buggy code

Objective-C’s duck typing system is both a blessing and a curse. A blessing, in that it’s amazingly flexible. A curse, in that such flexibility can lead to some awkward problems.

Something that typically happens in dealing with data from a property list, JSON or other similar format is that you perform some operation on an array of strings, only to find out that one of those strings was actually a dictionary. Boom – unrecognised selector sent to instance of NSCFDictionary. Unfortunately, in this case the magic smoke is escaping a long way from the gun barrel – we get to see what the problem is but not what caused it. The stack trace in the bug report only tells us what tried to use the collection that may have been broken a long time ago.

The easiest way to deal with bugs is to have the compiler catch them and refuse to emit any executable until they’re fixed. We can’t quite do that in Objective-C, at least not for the situation described here. That would require adding generics or a similar construction to the language definition, and providing classes to support such a construction. However we can do something which gets us significantly better runtime error diagnosis, using the language’s introspection facilities.

Imagine a mutable collection that knew exactly what kinds of objects it was supposed to accept. If a new object is added that is of that kind, then fine. If a new object of a different kind is added, then boom – invalid argument exception, crash. Only this time, the application crashes where we broke it not some time later. Actually, don’t imagine such a collection, read this one. Here’s the interface:

//
//  GLTypesafeMutableArray.h
//  GLTypesafeMutableArray
//
//  Created by Graham Lee on 24/05/2010.
//  Copyright 2010 Thaes Ofereode. All rights reserved.
//

#import <Cocoa/Cocoa.h>

@class Protocol;

/**
 * Provides a type-safe mutable array collection.
 * @throws NSInvalidArgumentException if the type safety is violated.
 */
@interface GLTypesafeMutableArray : NSMutableArray {
@private
    Class elementClass;
    Protocol *elementProtocol;
    CFMutableArrayRef realContent;
}

/**
 * The designated initialiser. Returns a type-safe mutable array instance.
 * @param class Objects added to the array must be an instance of this Class.
 *              Can be Nil, in which case class membership is not tested.
 * @param protocol Objects added to the array must conform to this Protocol.
 *                 Can be nil, in which case protocol conformance is not tested.
 * @note It is impossible to set this object's parameters after initialisation.
 *       Therefore calling -init will throw an exception; this initialiser must
 *       be used.
 */
- (id)initWithElementClass: (Class)class elementProtocol: (Protocol *)protocol;

/**
 * The class of which all added elements must be a kind, or Nil.
 */
@property (nonatomic, readonly) Class elementClass;

/**
 * The protocol to which all added elements must conform, or nil.
 */
@property (nonatomic, readonly) Protocol *elementProtocol;

@end

Notice that the class doesn’t allow you to build a type-safe array then set its invariants, nor can you change the element class or protocol after construction. This choice is deliberate: imagine if you could create an array to accept strings, add strings then change it to accept arrays. Not only could you then have two different kinds in the array, but the array’s API couldn’t tell you about both kinds. Also notice that the added elements can either be required to be of a particular class (or subclasses), or to conform to a particular protocol, or both. In theory it’s always better to define the protocol than the class, in practice most Objective-C code including Cocoa is light in its use of protocols.

The implementation is then pretty simple, we just provide the properties, initialiser and the NSMutableArray primitive methods. The storage is simply a CFMutableArrayRef.

//
//  GLTypesafeMutableArray.m
//  GLTypesafeMutableArray
//
//  Created by Graham Lee on 24/05/2010.
//  Copyright 2010 Thaes Ofereode. All rights reserved.
//

#import "GLTypesafeMutableArray.h"
#import <objc/Protocol.h>
#import <objc/runtime.h>

@implementation GLTypesafeMutableArray

@synthesize elementClass;
@synthesize elementProtocol;

- (id)init {
    @throw [NSException exceptionWithName: NSInvalidArgumentException
                                   reason: @"call initWithClass:protocol: instead"
                                 userInfo: nil];
}

- (id)initWithElementClass: (Class)class elementProtocol: (Protocol *)protocol {
    if (self = [super init]) {
        elementClass = class;
        elementProtocol = protocol;
        realContent = CFArrayCreateMutable(NULL,
                                           0,
                                           &kCFTypeArrayCallBacks);
    }
    return self;
}

- (void)dealloc {
    CFRelease(realContent);
    [super dealloc];
}

- (NSUInteger)count {
    return CFArrayGetCount(realContent);
}

- (void)insertObject:(id)anObject atIndex:(NSUInteger)index {
    if (elementClass != Nil) {
        if (![anObject isKindOfClass: elementClass]) {
            @throw [NSException exceptionWithName: NSInvalidArgumentException
                                           reason: [NSString stringWithFormat: @"Added object is not a kind of %@",
                                                    NSStringFromClass(elementClass)]
                                         userInfo: nil];
        }
    }
    if (elementProtocol != nil) {
        if (![anObject conformsToProtocol: elementProtocol]) {
            @throw [NSException exceptionWithName: NSInvalidArgumentException
                                           reason: [NSString stringWithFormat: @"Added object does not conform to %s",
                                                    protocol_getName(elementProtocol)]
                                         userInfo: nil];
        }
    }
    CFArrayInsertValueAtIndex(realContent,
                              index,
                              (const void *)anObject);
}

- (id)objectAtIndex:(NSUInteger)index {
    return (id)CFArrayGetValueAtIndex(realContent, index);
}

@end

Of course, this class isn’t quite production-ready: it won’t play nicely with toll-free bridging[*], isn’t GC-ready, and doesn’t supply any versions of the convenience constructors. That last point is a bit of a straw man though because the whole class is a convenience constructor in that it’s a realisation of the Builder pattern. If you need an array of strings, you can take one of these, tell it to only accept strings then add all your objects. Take a copy at the end and what you have is a read-only array that definitely only contains strings.

So what we’ve found here is that we can use the facilities provided by the Objective-C runtime to move our applications’ problems earlier in time, moving bug discovery from when we try to use the buggy object to when we try to create the buggy object. Bugs that are discovered earlier are easier to track down and fix, and are therefore cheaper to deal with.

[*]OK, so it is compatible with toll-free-bridging. Thanks to mike and mike for making me look that up…it turns out that CoreFundation just does normal ObjC message dispatch if it gets something that isn’t a CFTypeRef. Sadly, when I found that out I discovered that I had already read and commented on the post about bridging internals…

Careful how you define your properties

Spot the vulnerability in this Objective-C class interface:

@interface SomeParser : NSObject {
  @private
	NSString *content;
}
@property (nonatomic, retain) NSString *content;
- (void)beginParsing;
//...
@end

Any idea? Let’s have a look at a use of this class in action:

SomeParser *parser = [[SomeParser alloc] init];
NSMutableString *myMutableString = [self prepareContent];
parser.content = myMutableString;
[parser beginParsing];
[self modifyContent];

The SomeParser class retains an object that might be mutable. This can be a problem if the parser only functions correctly when its input is invariant. While it’s possible to stop the class’s API from mutating the data – perhaps using the State pattern to change the behaviour of the setters – if the ivar objects are mutable then the class cannot stop other code from making changes. Perhaps the string gets truncated while it’s being parsed, or valid data is replaced with invalid data while the parser is reading it.

If a class needs an instance variable to remain unmodified during the object’s lifetime (or during some lengthy operation), it should take a copy of that object. It’s easy to forget that in cases like strings and collections where the type of the ivar is immutable, but mutable subclasses exist. So to fix this parser:

@property (nonatomic, copy) NSString *content;

You could also make the property readonly and provide an -initWithContent: constructor, which takes a copy that will be worked on.

But with collection class properties these fixes may not be sufficient. Sure, you definitely get an immutable collection, but is it holding references to mutable elements? You need to check whether the collection class you’re using support shallow or deep copying—that is, whether copying the collection retains all of the elements or copies them. If you don’t have deep copying but need it, then you’ll end up having to implement a -deepCopy method yourself.

Note that the above discussion applies not only to collection classes, but to any object that has other objects as ivars and which is either itself mutable or might have mutable ivars. The general expression of the problem is fairly easy to express: if you don’t want your properties to change, then take copies of them. The specifics can vary from case to case and, as ever, the devil’s in the detail.

Why OS X (almost) doesn’t need root any more

Note: this post was originally written for the Mac Developer Network.

In the beginning, there was the super-user. And the super-user was root.

When it comes to doling out responsibility for privileged work in an operating system, there are two easy ways out. Single-user operating systems just do whatever they’re told by whoever has access, so anyone can install or remove software or edit configuration. AmigaDOS, Classic Mac OS and MS-DOS all took this approach.

The next-simplest approach is to add multiple users, and let one of them do everything while all the others can do nothing. This is the approach taken by all UNIX systems since time immemorial – the root user can edit all files, set access rights for files and devices, start network services on low-numbered ports…and everyone else can’t.

The super-user approach has obvious advantages in a multi-user environment over the model with no privilege mechanism – only users who know how to log in as root can manage the computer. In fact it has advantages in a single-user environment as well: that one user can choose to restrict her own privileges to the times when she needs them, by using a non-privileged account the rest of the time.

It’s still a limited mechanism, in that it’s all-or-nothing. You either have the permission to do everything, or you don’t. Certain aspects like the ability to edit files can be delegated, but basically you’re either root or you’re useless. If you manage to get root – by intention or by malicious exploitation – you can do anything on the computer. If you exploit a root-running network service you can get it to load a kernel extension: not because network services need to load kernel extensions, but because there is nothing to stop root from doing so.

And that’s how pretty much all UNIX systems, including Mac OS X, work. Before getting up in arms about how Apple disabled root in OS X, remember this: they didn’t disable root, they disabled the account’s password. You can’t log in to a default OS X installation as root (though you can on Mac OS X Server). All of the admin facilities on Mac OS X are implemented by providing access to the monolithic root account – running a software update, configuring Sharing services, setting the FileVault master password all involve gaining root privilege.

The way these administrative features typically work is to use Authorization Services, and the principle of least privilege. I devoted a whole chapter to that in Professional Cocoa Application Security so won’t go into too much detail here, the high-level view is that there are two components, one runs as the regular user and the other as root. The unprivileged part performs an authorisation test and then, at its own discretion, decides whether to call the privileged helper. The privileged part might independently test whether the user application really did pass the authorisation test. The main issue is that the privileged part still has full root access.

So Authorization Services gives us discretionary access control, but there’s also a useful mandatory test relevant to the super-user. You see, traditional UNIX tests for whether a user is root by doing this:

if (process.p_euid == 0) {

Well, Mac OS X does do something similar in parts, but it actually has a more flexible test in places. There’s a kernel authorisation framework called kauth – again, there’s a chapter in PCAS on this so I don’t intend to cover too much detail. It basically allows the kernel to defer security policy decisions to callbacks provided by kernel extensions, one such policy question is “should I give this process root?”. Where the kernel uses this test, the super-user access is based not on the effective UID of the calling process, but on whatever the policy engine decides. Hmm…maybe the policy engine could use Authorization Services? If the application is an installer, and it has the installer right, and it’s trying to get root access to the filesystem, then it’s allowed.

Apple could then do away with monolithic root privileges completely, allowing the authorisation policy database to control who has privileged access for what tasks with which applications. The advantage is that if a privileged process ever gets compromised, the consequences for the rest of the OS are reduced.

On improved tool support for Cocoa developers

I started writing some tweets, that were clearly taking up too much room. They started like this:

My own thoughts: tool support is very important to good software engineering. 3.3.1 is not a big inhibitor to novel tools. /cc @rentzsch

then this:

There’s still huge advances to make in automating design, bug-hunting/squashing and traceability/accountability, for instance.

(The train of thought was initiated by the Dog Spanner’s [c4 release]; post.)

In terms of security tools, the Cocoa community needs to catch up with where Microsoft are before we need to start wondering whether Apple might be holding us back. Yes, I have started working on this, I expect to have something to show for it at NSConference MINI. However, I don’t mind whether it’s you or me who gets the first release, the important thing is that the tools should be available for all of us. So I don’t mind sharing my impression of where the important software security engineering tools for Mac and iPhone OS developers will be in the next few years.

Requirements comprehension

My first NSConference talk was on understanding security requirements, and it’s the focus of Chapter 1 of Professional Cocoa Application Security. The problem is, most of you aren’t actually engineers of security requirements, you’re engineers of beautiful applications. Where do you dump all of that security stuff while you’re focussing on making the beautiful app? It’s got to be somewhere that it’s still accessible, somewhere that it stays up to date, and it’s got to be available when it’s relevant. In other words, this information needs to be only just out of your way. A Pages document doesn’t really cut it.

Now over in the Windows world, they have Microsoft Threat Modeling Tool, which makes it easy to capture and organise the security requirements. But stops short of providing any traceability or integration with the rest of the engineering process. It’d be great to know how each security requirement impacts each class, or the data model, etc.

Bug-finding

The Clang analyser is just the start of what static analysis can do. Many parts of Cocoa applications are data-driven, and good analysis tools should be able to inspect the relationship between the code and the data. Other examples: currently if you want to ensure your UI is hooked up properly, you manually write tests that inspect the outlets, actions and bindings you set up in the XIB. If you want to ensure your data model is correct, you manually write tests to inspect your entity descriptions and relationships. Ugh. Code-level analysis can already reverse-engineer test conditions from the functions and methods in an app, they ought to be able to use the rest of the app too. And it ought to make use of the security model, described above.

I have recently got interested in another LLVM project called KLEE, a symbolic execution tool. Current security testing practices largely involve “fuzzing”, or choosing certain malformed/random input to give to an app and seeing what it does. KLEE can take this a step further by (in effect) testing any possible input, and reporting on the outcomes for various conditions. It can even generate automated tests to make it easy to see what effect your fixes are having. Fuzzing will soon become obsolete, but we Mac people don’t even have a good and conventional tool for that yet.

Bug analysis

Once you do have fuzz tests or KLEE output, you start to get crash reports. But what are the security issues? Apple’s CrashWrangler tool can take a stab at analysing the crash logs to see whether a buffer overflow might potentially lead to remote code execution, but again this is just the tip of the iceberg. Expect KLEE-style tools to be able to report on deviations from expected behaviour and security issues without having to wait for a crash, just as soon as we can tell the tool what the expected behaviour is. And that’s an interesting problem in itself, because really the specification of what you want the computer to do is your application’s source code, and yet we’re trying to determine whether or not that is correct.

Safe execution

Perhaps the bitterest pill to swallow for long time Objective-C programmers: some time soon you will be developing for a managed environment. It might not be as high-level as the .Net runtime (indeed my money is on the LLVM intermediate representation, as hardware-based managed runtimes have been and gone), but the game has been up for C arrays, memory dereferencing and monolithic process privileges for years. Just as garbage collectors have obsoleted many (but of course not all) memory allocation problems, so environment-enforced buffer safety can obsolete buffer overruns, enforced privilege checking can obsolete escalation problems and so on. We’re starting to see this kind of safety retrofitted to compiled code using stack guards and the like, but by the time the transition is complete (if it ever is), expect your application’s host to be unrecognisable to the app as an armv7 or x86_64, even if the same name is still used.

On localisation and security

Hot on the heels of Uli’s post on the problems of translation, I present another problem you might encounter while localising your code. This is a genuine bug (now fixed, of course) in code I have worked on in the past, only the data has been changed to protect the innocent.

We had a crash in the following line:

NSString *message = [NSString stringWithFormat:
	NSLocalizedString(@"%@ problems found", @"Discovery message"),
	problem];

Doesn’t appear to be anything wrong with that, does there? Well, as I say, it was a crasher. The app only crashed in one language though…for purposes of this argument, we’ll assume it was English. Let’s have a look at English.lproj/Localizable.strings:

/* Discovery message */
"%@ problems found" = "%@ found in %@";

Erm, that’s not so good. It would appear that at runtime, the variadic method +[NSString stringWithFormat: (NSString *)fmt, ...] is expecting two arguments to follow fmt, but only passed one, so it ends up reading its way off the end of the stack. That’s a classic format string vulnerability, but with a twist: none of our usual tools (by which I mean the various -Wformat flags and the static analyser) can detect this problem, because the format string is not contained in the code.

This problem should act as a reminder to ensure that the permissions on your app’s resources are correct, not just on the binary—an attacker can cause serious fun just by manipulating a text file. It should also suggest that you audit your translators’ work carefully, to ensure that these problems don’t arise in your app even without tampering.