On device identifiers

Note: as ever, this blog refrains from commenting on speculation regarding undisclosed product innovations from device providers. This post is about the concept of tracking users via a device identifier. You might find the discussion useful in considering future product directions; that’s fine.

Keeping records of what users are up to has its benefits. As a security boffin, I’m fully aware of the benefits of good auditing: discovering what users (and others) have (or haven’t) done to a valuable system. It also lets developers find out how users are getting on with their application: whether they ignore particular features, or have trouble deciding what to do at points in the app’s workflow.

Indeed, sometimes users actively enjoy having their behaviour tracked. Every browser has a history feature; many games let players see what they’ve achieved and compare it with others. Location games would be pretty pointless if they could only tell me where I am now, not tell the story of where I’ve been.

A whole bunch of companies package APIs for tracking user behaviour in smartphone devices. These are the Analytics companies. To paraphrase Scott Adams: analytics is derived from the root word “anal”, and the Greek “lytics” meaning “to pull a business model from”. What they give developers for free is the API to put analytics services in their apps, and the tools to draw useful conclusions from these data.

This is where the fun begins. Imagine that the analytics company uses some material derived from a device identifier (a UDID, IMEI, or some other hardware key) as the database key to associate particular events with users. Now, if the same user uses multiple apps even by different developers on the same device, and they all use that analytics API, then that analytics provider can aggregate the data across all of the apps and build up a bigger picture of that user’s behaviour.

If only one of the apps records the user’s name as part of its analytics, then the analytics company – a company with whom the user has no relationship – gets to associate all of that behaviour with the user’s real name. So, of course, do that company’s customers: remember that the users and developers are provided with their stuff for free, and that businesses have limited tendency toward altruism. The value in an analytics company is their database, so of course they sell that database to those who will buy it: like advertisers, but again, companies with whom the user has no direct relationship.

People tend to be uneasy about invisible or unknown sharing of their information, particularly when the scope or consequences of such sharing are not obvious up front[*]. The level of identifiable[**] information and scope of data represented by a cross-app analysis of a smartphone user’s behaviour – whether aggregated via the model described above or other means – is downright stalker-ish, and will make users uncomfortable.

One can imagine a scenario where smartphone providers try not to make their users uncomfortable: after all, they are the providers’ bread and butter. So they don’t give developers access to such a “primary key” as has been described here. Developers would then be stuck with generating identifiers inside their apps, so tracking a single user inside a single app would work but it would be impossible to aggregate the data across multiple apps, or the same app across multiple devices.

Unless, of course, the developer can coerce the user into associating all of those different identifiers with some shared identifier, such as a network service username. But how do you get users to sign up for network services? By ensuring that the service has value for the user. Look at game networks that do associate user activity across multiple apps, like OpenFeint and Game Center: they work because players like seeing what games their friends are playing, and sharing their achievements with other people.

The conclusion is, in the no-device-identifier world, it’s still possible to aggregate user behaviour, but only if you exchange that ability for something that the user values. Seems like a fair deal.

[*] My “don’t be a dick” guide to data privacy takes into account the fact that people like sharing information via online services such as Facebook, foursquare etc. but that they want to do so on their own terms. It goes like this: you have no right to anything except what the user told you. You have no right to share anything except what the user told you to share; they will tell you who you may share it with, and can change their minds. The user has a right to know what they’re telling you and how you’re sharing it.

[**] Given UK-sized postal zones, your surname and postcode are sufficient to uniquely identify you. Probably your birthday and postcode would also work. It doesn’t take much information to uniquely identify someone, anyway.

About Graham

I make it faster and easier for you to create high-quality code.

View all posts by Graham →

This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to On device identifiers

Richard Buckle says:

2011-08-22 at 00:09

Good analysis, though I think you perhaps meant “entice” rather than “coerce”.

To clarify your second footnote, were you referring to the full postcode or just the first half of it?
Graham says:

2011-08-22 at 09:51

Maybe “entice” is a nicer way of looking at it…. Regarding the postcode, yes we’re talking the full postcode. If you just look at the delivery area, then identification becomes harder for more common surnames (how many people with the surname Lee live in central Oxford? – based on the number of people in the UK with that surname and the size of the city, I’d estimate about 60), however you’re still looking at a few possible people rather than a confusing load.
Jason Fuerstenberg says:

2011-08-22 at 13:43

Your analysis is completely on-point!

Apple benefits from removing access to the UDID as its Twitter and iCloud integration will become the user identifiers of choice for app developers. Even more than regular email addresses which require the user’s input.

A Twitter ID accessed via a permissions alert, naturally, is the easiest way to identify users in the absence of a UDID.

Those of us who use the UDID for non-nefarious purposes (detecting app piracy) will be out of luck though.
Matthew Frederick says:

2011-08-22 at 19:07

The beauty of a hardware-level device identifier is the ability to create sign-in-free effortless preference maintenance that survives both uninstallation-reinstallation as well as device-wipe-but-couldn’t-restore events.

A truly useful alternate to a hardware ID would be an OS-level method that provided a device-specific unique identifier that was associated with an app’s bundle ID and effectively non-reversible (no easily figuring out the hardware ID from the system-generated one, like a secure key generator). It would solve the “problem” for everyone but the cross-app privacy creeps, which is the ideal.
Graham Lee says:

2011-08-22 at 21:14

Matthew, iCloud is designed to support the use case in your first paragraph. Unless there’s something I missed, you should be able to do such preference migration with the iCloud key-value store.
Matthew Frederick says:

2011-08-23 at 01:16

Graham, yes, that may work, but there are a lot of limitations and problems.

The first is that iCloud syncing is linked to Apple IDs and it’s not at all uncommon for multiple family members to use the same ID, sharing music and apps and such. My girlfriend and I use the same ID, for example, but certainly don’t want to share settings.

(Frankly I’m not sure how that’s going to work out with iCloud in general; we may well have to separate accounts, resulting in having to purchase a whole bunch of additional apps; it wouldn’t surprise me if this requirement resulted in many more people disabling iCloud than one would hope.)

Even one person with multiple devices may not want the same settings associated with every device; I use very different settings in a variety of apps on my iPhone and my iPad.

Another issue is that iCloud has to be enabled on the device, as alluded to earlier. There are a number of reasons it wouldn’t be including user ignorance, or more problematically, in enterprise and educational deployments where iCloud syncing is likely undesirable.

Even if it is enabled it’s quite common to use the same Apple ID for multiple devices in both enterprise and educational settings. Sometimes this could be handy but much of the time it’s a deal-killer.

Insisting on identifiable sign-ins solves these problems, I know, but it adds a layer of overhead that currently doesn’t exist, and as every indie dev who sells apps knows, anything that increases friction costs sales.

A hardware-plus-bundle identifier works in all of these situations. If iCloud is appropriate for the app or the deployment then by all means use it. If not then something that provides the same convenience without the privacy implications would sure help if UDIDs go out the window.

FWIW when using UDIDs now I actually use a salted hash of the ID for privacy purposes, so that I’m never transmitting the real thing nor storing it on a server, and use different app-associated salts for different apps. It’d be great if Apple could take care of that for me.

Comments are closed.

About Graham

6 Responses to On device identifiers

OOP the Easy Way

APPropriate Behaviour

APPosite Concerns

FSF