On designing collections

Introduction

This post explores the pros and the cons of following the design rule “Objects responsible for collections of other objects should expose an interface to the collection, not the collection itself”. Examples and other technical discussion is in Objective-C, assuming Foundation classes and idioms.

Example Problem

Imagine you were building Core Data today. You get to NSEntityDescription, which encapsulates the information about an entity in the Managed Object Model including its attributes and relations, collectively “properties”. You have two options:

  1. Let client code have access to the actual collection of properties.
  2. Make client code work with the properties via API on NSEntityDescription or abstract collection interfaces.

In reality, NSEntityDescription does both, but not with the same level of control. As external clients of the class we can’t see whether the collection objects are internal state of the entity description or are themselves derived from its real state, although this LGPL implementation from GSCoreData does return its real instance variable in one case. However, this implementation is old enough not to show another aspect of Apple’s class: their entity description class itself conforms to NSFastEnumeration.

How to give access to the actual collection

This is the easiest case.

@implementation NSEntityDescription
{
    NSArray *_properties;
}

- (NSArray *)properties
{
    return _properties;
}

@end

A common (but trivial) extension to this is to return an autoreleased copy of the instance variable, particularly in the case where the instance variable itself is mutable but clients should be given access to a read-only snapshot of the state. A less-common technique is to build a custom subclass of NSArray using an algorithm specific to this application.

How to provide abstract collection access

There are numerous ways to do this, so let’s take a look at a few of them.

Enumerator

It doesn’t matter how a collection of things is implemented, if you can supply each in turn when asked you can give out an enumerator. This is how NSFileManager lets you walk through the filesystem. Until a few years ago, it’s how NSTableView let you see each of the selected columns. It’s how PSFeed works, too.

The good news is, this is usually really easy. If you’re already using a Foundation collection class, it can already give you an appropriate NSEnumerator.

- (NSEnumerator *)propertyEnumerator
{
     return [_properties objectEnumerator];
}

You could also provide your own NSEnumerator subclass to work through the collection in a different way (for example if the collection doesn’t actually exist but can be derived lazily).

Fast enumeration

This is basically the same thing as “Enumerator”, but has different implementation details. In addition, the overloaded for keyword in Objective-C provides a shorthand syntax for looping over collections that conform to the NSFastEnumeration protocol.

Conveniently, NSEnumerator conforms to the protocol so it’s possible to go from “Enumerator” to “Fast enumeration” very easily. All of the Foundation collection classes also implement the protocol, so you could do this:

- (id <NSFastEnumeration>)properties
{
    return _properties;
}

Another option—the one that NSEntityDescription actually uses—is to implement the -countByEnumeratingWithState:options:count: method yourself. A simple implementation passes through to a Foundation collection class that already gets the details right. There are a lot of details, but a custom implementation could do the things a custom NSEnumerator subclass could.

Object subscripting

If you’ve got a collection that’s naturally indexed, or naturally keyed, you can let people access that collection without giving them the specific implementation that holds the collected objects. The subscripting methods let you answer questions like “what is the object at index 3?” or “which object is named ‘Bob’?”. As with “Fast enumeration” there is syntax support in the language for conveniently using these features in client code.

- (id)objectAtIndexedSubscript: (NSUInteger)idx
{
    return _properties[idx];
}

Key-Value Coding

Both ordered and unordered collections can be hidden behind the to-many Key-Value Coding accessors. These methods also give client code the opportunity to use KVC’s mutable proxy collections, treating the real implementation (whatever it is) as if it were an NSMutableArray or NSMutableSet.

- (NSUInteger)countOfProperties
{
    return [_properties count];
}

- (NSPropertyDescription *)objectInPropertiesAtIndex: (NSUInteger)index
{
    return _properties[index];
}

- (NSArray *)propertiesAtIndexes: (NSIndexSet *)indexes
{
    return [_properties objectsAtIndexes: indexes];
}

Block Application (or Higher-Order Messaging)

You could decide not to give out access to the collection at all, but to allow clients to apply work to the objects in that collection.

- (void)enumeratePropertiesWithBlock: (void (^)(NSPropertyDescription *propertyDescription))workBlock
{
    NSUInteger count = [_properties count];
    dispatch_apply(count, _myQueue, ^(size_t index) {
        workBlock(_properties[index]);
    });
}

Higher-order messaging is basically the same thing but without the blocks. Clients could supply a selector or an invocation to apply to each object in the collection.

Visitor

Like higher-order messaging, Visitor is a pattern that lets clients apply code to the objects in a collection regardless of the implementation or even logical structure of the collection. It’s particularly useful where client code is likely to need to maintain some sort of state across visits; a compiler front-end might expose a Visitor interface that clients use to construct indices or symbol tables.

- (void)acceptPropertyVisitor: (id <NSEntityDescriptionVisitor>)aVisitor
{
    for (NSPropertyDescription *property in _properties)
    {
        [aVisitor visit: property];
    }
}

Benefits of Hiding the Collection

By hiding the implementation of a collection behind its interface, you’re free to change the implementation whenever you want. This is particularly true of the more abstract approaches like Enumerator, Fast enumeration, and Block Application, where clients only find out that there is a collection, and none of the details of whether it’s indexed or not, sparse or not, precomputed or lazy and so on. If you started with an array but realise an indexed set or even an unindexed set would be better, there’s no problem in making that change. Clients could iterate over objects in the collection before, they can now, nothing has changed—but you’re free to use whatever algorithms make most sense in the application. That’s a specific example of the Principle of Least Knowledge.

With the Key-Value Coding approach, you may be able to answer some questions in your class without actually doing all the work to instantiate the collected objects, such as the Key-Value Coding collection operators.

Additionally, there could be reasons to control use of the collected objects. For mutable collections it may be better to allow Block Application than let clients take an array copy which will become stale. Giving out the mutable collection itself would lead to all sorts of trouble, but that can be avoided just with a copy.

Benefits of Providing the Collection

It’s easier. Foundation already provides classes that can do collections and all of the things you might want to do with them (including enumeration, KVC, and application) and you can just use those implementations without too much work: though notice that you can do that while still following the Fast Enumeration pattern.

It’s also conventional. Plenty of examples exist of classes that give you a collection of things they’re responsible for, like subviews, child view controllers, table view columns and so on (though notice that the table view previously gave an Enumerator of selected columns, and lets you Apply Block to row views). Doing the same thing follows a Principle of Least Surprise.

When integrating your components into an app, there may be expectations that a given collection is used in a context where a Foundation collection class is expected. If your application uses an array controller, you need an array: you could expect the application to provide an adaptor, or you could use Key-Value Coding and supply the proxy array, but it might be easiest to just give the application access to the collection in the first place.

Finally, the Foundation collection classes are abstract, so you still get some flexibility to change the implementation by building custom subclasses.

Conclusion

There isn’t really a “best” way to do collections, there are benefits and limitations of any technique. By understanding the alternatives we can choose the best one for any situation (or be like NSEntityDescription and do a bit of each). In general though, giving interfaces to get or apply work to elements of the collection gives you more flexibility and control over how the collection is maintained, at the cost of being more work.

Coda

“More than one thing” isn’t necessarily a collection. It could be better modelled as a source, that generates objects over time. It could be a cursor that shows the current value but doesn’t remember earlier ones, a bit like an Enumerator. It could be something else. The above argument doesn’t consider those cases.

About Graham

I make it faster and easier for you to create high-quality code.
This entry was posted in Foundation, OOP, software-engineering. Bookmark the permalink.