On the lesser presentations

Advice on presentations – including that given on this blog, is often geared toward the “showbusiness” presentation. We’re usually talking about the big conference talk or product launch, where you can afford to put in the time to make a good, slick performance: a few days of preparation for a half-hour talk is not unheard of.

Not every presentation fits that mould. There are plenty where putting so much time into the presentation would be harmful, but where is the guidance on constructing those presentations?

The minor event

If you spent a few days preparing for a sprint-end demo, or reporting back to your team on some study you did, you’d significantly harm your productivity on the rest of your job. In these contexts, you want to spend a small amount of time building your talk, and you still want to put on a good show: to make your team and your stakeholders feel happy about the work you did, or to make the case persuasively for the tool or technique you studied.

As such, in these cases I still build an outline for my talk outside of the presentation software, and construct my slides to follow the outline. I’ve used OmniOutliner in the past, I use emacs org-mode now, you could use a list of bullets in MarkDown or a pen and a sheet of paper. It doesn’t matter, what matters is that you get what you want to say structured in one place, and the slides that support the presentation done separately.

I try to keep these slides text-free, particularly if the presentation is short, so that people don’t get distracted reading them. If I’m reporting on progress, then screenshots of some progress dashboard make for quickly-constructed slides. My current team shows its burndown every sprint end, that’s a quick screencapture that tells the story of the last two weeks. If there’s some headline figure (26 stories delivered; 80% of MVP scope complete; 2 new people on the team) then a slide containing that number makes for a good backdrop to talking about the subject.

The recurrent deck

The antithesis of the conference keynote presentation style, the recurrent deck is a collection of slides you’ll use over and over again. Your approach to integrating with third-party APIs, your software architecture philosophy, your business goals for 2018…you’ll need to present these over and over again in different contexts.

More to the point, other people will want to present them, too: someone in sales will answer a question about integration by using your integration slides. Your department director will present your team’s goals in the context of her department’s goals. And this works the other way: you can use your CEO’s slide on product strategy to help situate your team goals.

So throw out everything you learned about crafting the slides to fit the story. What you’re doing here is coming up with a reusable visual that can support any story related to the same information. I try to make these slides as information-rich as possible, though still diagrammatic rather than textual to avoid the presentation failure mode of reading out the slide. My current diagram tool is Lucidchart, as it’s my company’s standard, I’ve used OmniGraffle and dia too. Whatever tool, follow the house style (e.g. colour schemes, fonts, iconography) so that when you mash up your slides and your CEO’s slides, it still looks like a coherent presentation.

I try to make each slide self-contained, because I or someone else might take one to use in a different presentation so a single idea shouldn’t need a six-slide reveal and a colleague will find it harder to reuse the slide if it isn’t self-explanatory.

A frequent anti-pattern in slide design is to include the “page number” on the slide: not only is that information useless in a presentation, the only likely outcome is a continuity error when you drag a few slides from different sources to throw a talk together and don’t renumber them. Or worse, can’t: I’ve been given slides before that are screenshots of whatever slide was originally built, so the number is part of a bitmap.

Good reusable slide libraries will also be a boon in quickly constructing the minor event presentations: “we did this because that” can be told with one novel part (we did this) and one appeal to the library (because that).

Improving a presentation with slides

Take a look at your slides. For each slide, think how you would present the same information if you didn’t have the slide. Practise that, so that you can give the information on the slide without using the slide as an aide memoire. Practise that, until you can introduce that topic, discuss it, and move on to the next without a single reference to the slide. Do the same for each slide.

How will that improve my slides?

It won’t. It will improve your presentation with slides, by turning it into a presentation without slides.

As an optional extra, you could make new slides that support the presentation, but it shouldn’t be necessary.

Dark Silicon

About 10 years ago, we decided that the performance gains in single-core processors that come “for free” with advancing semiconductor processes were slowing down. Many chip makers switched to scaling the number of cores on a die, and promoted parallel programming for their products.

Today I learned that the free multicore lunch is over, too. You can no longer turn on all of the transistors in a single chip, so you can no longer get 2× the threads running by doubling the number of cores in your processor.

Separating user interface from work

Here’s a design I’ve had knocking around my head for a while, and between a discussion we had a few weeks ago at work and Saul Mora’s excellent design patterns talk at QCon I’ve built it.

A quick heads-up: currently the logic is all built into a side project app I’ve been working on so I don’t have a single project download I can point to. The post here should explain all of the relevant code, which is made available under the terms of the MIT Licence. A reusable component is forthcoming.

Motivation

Remove the Massive View Controller from our applications’ architectures. Push Cocoa, Cocoa Touch, or other frameworks to the edges of our codebase, responsible only for working with the UI. Separate the concerns of user interaction, work scheduling and the actual work.

There are maintainability reasons for doing so. We separate unrelated work into different classes, localising the responsibilities and removing coupling between them. The same code can be used in multiple contexts, because the UI frameworks are decoupled from the work that they’re doing. This is not only a benefit for cross-platform work but for re-using the same logic in different places in a single user interface.

We also notice performance optimisations that become possible. With a clear delineation between the user interface code and the work, it’s much easier to understand which parts of the application must be run on the user interface thread and which can be done in other contexts.

Solution

Implement the Message Bus pattern from server applications. In response to a user event, the user interface creates a command object and sends it to a command bus. The command bus picks an appropriate handler, passes it the command and schedules it. The user interface, the work done and the scheduling of that work are therefore all decoupled.

IKBCommand Class Diagram

Workflow

At application launch, the application accesses the shared IKBCommandBus object:

@interface IKBCommandBus : NSObject

+ (instancetype)applicationCommandBus;
- (void)execute: (id <IKBCommand>)command;
- (void)registerCommandHandler: (Class)handlerClass;

@end

and registers command handlers. Command handlers know what commands they can process, and can tell the bus whether they will accept a given command. Handlers can also be loaded later, for example in Mac applications or server processes when a new dynamic bundle is loaded.

Once the application is running, the command bus can be used by user interface controllers. These controllers are typically UIViewControllers in an iOS app, NSViewControllers, NSDocuments or other objects in a Cocoa app, or maybe something else in other contexts. A controller receives an action related to a user interface event, and creates a specific IKBCommand.

@protocol IKBCommand <NSObject, NSCoding>

@property (nonatomic, readonly) NSUUID *identifier;

@end

Commands represent requests to do specific work, so the controller needs to configure the properties of the command it created based on user input such as the current state of text fields and so on. This is done on the user interface thread to ensure that the controller accesses its UI objects correctly.

The controller then tells the application’s command bus to execute the command. This does not need to be done on the user interface thread. The bus looks up the correct handler:

@interface IKBCommandHandler : NSObject

+ (BOOL)canHandleCommand: (id <IKBCommand>)command;
- (void)executeCommand: (id <IKBCommand>)command;

@end

Then the bus schedules the handler’s executeCommand: method.

Implementation and Discussion

The Command protocol includes a unique identifier and conformance to the NSCoding protocol. This supports the Event Sourcing pattern, in which changes to the application can be stored directly as a sequence of events. Rather than storing the current state in a database, the app could just replay all events it has received when it starts up.

This opens up possibilities including journaling (the app can replay messages it received but didn’t get a chance to complete due to some outage) and syncing (the app can retrieve a set of events from a remote source and play those it hasn’t already seen). An extension to the implementation provided here is that the event source acts as a natural undo stack, if commands can express how to revert their work. In fact, even if an event can’t be reversed, you can “undo” it by removing it from the event store and replaying the whole log back into the application from scratch.

When a command is received by the bus, it looks through the handlers that have been registered to find one that can handle the command. Then it schedules that handler on a queue.

@implementation IKBCommandBus
{
  NSOperationQueue *_queue;
  NSSet *_handlers;
}

static IKBCommandBus *_defaultBus;

+ (void)initialize
{
  if (self == [IKBCommandBus class])
    {
      _defaultBus = [self new];
    }
}

+ (instancetype)applicationCommandBus
{
  return _defaultBus;
}

- (id)init
{
  self = [super init];
  if (self)
    {
      _queue = [NSOperationQueue new];
      _handlers = [[NSSet set] retain];
    }
  return self;
}

- (void)registerCommandHandler: (Class)handlerClass
{
  _handlers = [[[_handlers autorelease] setByAddingObject: handlerClass] retain];
}

- (void)execute: (id <IKBCommand>)command
{
  IKBCommandHandler *handler = nil;
  for (Class handlerClass in _handlers)
    {
      if ([handlerClass canHandleCommand: command])
        {
          handler = [handlerClass new];
          break;
        }
    }
  NSAssert(handler != nil, @"No handler defined for command %@", command);
  NSInvocationOperation *executeOperation = [[NSInvocationOperation alloc] initWithTarget: handler selector: @selector(executeCommand:) object: command];
  [_queue addOperation: executeOperation];
  [executeOperation release];
  [handler release];
}

- (void)dealloc
{
  [_queue release];
  [_handlers release];
  [super dealloc];
}

@end

Updating the UI

By the time a command is actually causing code to be run it’s far away from the UI, running a command handler’s method in an operation queue. The application can use the Observer pattern (for example Key Value Observing, or Cocoa Bindings) to update the user interface when command handlers change the data model.

A two-dimensional dictionary

What?

A thing I made has just been open-sourced by my employers at Agant: the AGTTwoDimensionalDictionary works a bit like a normal dictionary, except that the keys are CGPoints meaning we can find all the objects within a given rectangle.

Why?

A lot of time on developing Discworld: The Ankh-Morpork Map was spent on performance optimisation: there’s a lot of stuff to get moving around a whole city. As described by Dave Addey, the buildings on the map were traced and rendered into separate images so that we could let characters go behind them. This means that there are a few thousand of those little images, and whenever you’re panning the map around the app has to decide which images are visible, put them in the correct place (in three dimensions; remember people can be in front of or behind the buildings) and draw everything.

A first pass involved creating a set containing all of the objects, looping over them to find out which were within the screen region. This was too slow. Implementing this 2-d index instead made it take about 20% the original time for only a few tens of kilobytes more memory, so that’s where we are now. It’s also why the data type doesn’t currently do any rebalancing of its tree; it had become fast enough for the app it was built in already. This is a key part of performance work: know which battles are worth fighting. About one month of full-time development went into optimising this app, and it would’ve been more if we hadn’t been measuring where the most benefit could be gained. By the time we started releasing betas, every code change was measured in Instruments before being accepted.

Anyway, we’ve open-sourced it so it can be fast enough for your app, too.

How?

There’s a data structure called the multidimensional binary tree or k-d tree, and this dictionary is backed by that data structure. I couldn’t find an implementation of that structure I could use in an iOS app, so cracked open the Objective-C++ and built this one.

Objective-C++? Yes. There are two reasons for using C++ in this context: one is that the structure actually does get accessed often enough in the Discworld app that dynamic dispatch all the way down adds a significant time penalty. The other is that the structure contains enough objects that having a spare isa pointer per node adds a significant memory penalty.

But then there’s also a good reason for using Objective-C: it’s an Objective-C app. My controller objects shouldn’t have to be written in a different language just to use some data structure. Therefore I reach for the only application of ObjC++ that should even be permitted to compile: an implementation in one language that exposes an interface in the other. Even the unit tests are written in straight Objective-C, because that’s how the class is supposed to be used.

Surprising ARC performance characteristics

The project I’m working on at the moment has quite tight performance constraints. It needs to start up quickly, do its work at a particular rate and, being an iOS app, there’s a hard limit on how much RAM can be used.

The team’s got quite friendly with Instruments, watching the time profile, memory allocations, thread switches[*] and storage access trying to discover where we can trade one off in favour of another.

[*] this is a topic for a different post, but “dispatch_async() all the things” is a performance “solution” that brings its own problems.

It was during one of these sessions that I noticed a hot loop in the app was spending a lot of time in a C++ collection type called objc::DenseMap. This is apparently used by objc_retain() and objc_release(), the functions used to implement reference counting when Objective-C code is compiled using ARC.

The loop was implemented using the closure interface, -[NSDictionary enumerateKeysAndValuesUsingBlock:. Apparently the arguments to a block are strong references, so each was being retained on entering the block and released on return. Multiply by thousands of objects in the collection and tens of iterations per second, and that was a non-trivial amount of time to spend in memory management functions.

I started to think about other data types in which I could express the same collection—is there something in the C++ standard library I could use?

I ended up using a different interface to the same data type – something proposed by my colleague, Mo. Since Cocoa was released, Foundation data types have been accessible via the CoreFoundation C API[**]. The key difference as far as modern applications are concerned is that the C API uses void * to refer to its content rather than id. As a result, and with appropriate use of bridging casts, ARC doesn’t try to retain and release the objects.

[**]I think that Foundation on OPENSTEP was designed in the same way, but that the C API wasn’t exposed until the OS X 10.0 release.

So this:

[myDictionary enumerateKeysAndObjectsUsingBlock: ^(id key, id object, BOOL *stop) {
  //...
}];

became this:

CFDictionaryRef myCFDictionary = (__bridge CFDictionaryRef)myDictionary;
CFIndex count = CFDictionaryGetCount(myCFDictionary);
void *keys[count];
void *values[count];
CFDictionaryGetKeysAndValues(myCFDictionary, keys, values);

for (CFIndex i = 0; i < count; i++)
{
  __unsafe_unretained id key = (__bridge id)keys[i];
  __unsafe_unretained id value = (__bridge id)values[i];
  //...
}

which turned out to be about 12% faster in this case.

I’ll finish by addressing an open question from earlier, when should I consider ditching Foundation/CoreFoundation completely? There are times when it’s appropriate to move away from those data types. Foundation’s adaptive algorithms are very fast a lot of the time, choosing different representations under different conditions – but aren’t always the best choice.

Considering loops that enumerate over a collection like the loop investigated in this post, a C++ or C structure representation is good if the loop is calling a lot of messages. Hacks like IMP caching can also help, in which this:

for (MyObject *foo in bar)
{
  [foo doThing];
}

becomes this:

SEL doThingSelector = @selector(doThing);
IMP doThingImp = class_getMethodImplementation([MyObject class], doThingSelector);

for (MyObject *foo in bar)
{
  doThingImp(foo, doThingSelector);
}

If you’ve got lots (like tens of thousands, or hundreds of thousands) of instances of a class, Objective-C will add a measurable memory impact in the isa pointers (each object contains a pointer to its class), and the look aside table that tracks retain counts. Switching to a different representation can regain that space: in return for losing dynamic dispatch and reference-counted memory management—automatic or otherwise.