Note: as ever, this blog refrains from commenting on speculation regarding undisclosed product innovations from device providers. This post is about the concept of tracking users via a device identifier. You might find the discussion useful in considering future product directions; that’s fine.
Keeping records of what users are up to has its benefits. As a security boffin, I’m fully aware of the benefits of good auditing: discovering what users (and others) have (or haven’t) done to a valuable system. It also lets developers find out how users are getting on with their application: whether they ignore particular features, or have trouble deciding what to do at points in the app’s workflow.
Indeed, sometimes users actively enjoy having their behaviour tracked. Every browser has a history feature; many games let players see what they’ve achieved and compare it with others. Location games would be pretty pointless if they could only tell me where I am now, not tell the story of where I’ve been.
A whole bunch of companies package APIs for tracking user behaviour in smartphone devices. These are the Analytics companies. To paraphrase Scott Adams: analytics is derived from the root word “anal”, and the Greek “lytics” meaning “to pull a business model from”. What they give developers for free is the API to put analytics services in their apps, and the tools to draw useful conclusions from these data.
This is where the fun begins. Imagine that the analytics company uses some material derived from a device identifier (a UDID, IMEI, or some other hardware key) as the database key to associate particular events with users. Now, if the same user uses multiple apps even by different developers on the same device, and they all use that analytics API, then that analytics provider can aggregate the data across all of the apps and build up a bigger picture of that user’s behaviour.
If only one of the apps records the user’s name as part of its analytics, then the analytics company – a company with whom the user has no relationship – gets to associate all of that behaviour with the user’s real name. So, of course, do that company’s customers: remember that the users and developers are provided with their stuff for free, and that businesses have limited tendency toward altruism. The value in an analytics company is their database, so of course they sell that database to those who will buy it: like advertisers, but again, companies with whom the user has no direct relationship.
People tend to be uneasy about invisible or unknown sharing of their information, particularly when the scope or consequences of such sharing are not obvious up front[*]. The level of identifiable[**] information and scope of data represented by a cross-app analysis of a smartphone user’s behaviour – whether aggregated via the model described above or other means – is downright stalker-ish, and will make users uncomfortable.
One can imagine a scenario where smartphone providers try not to make their users uncomfortable: after all, they are the providers’ bread and butter. So they don’t give developers access to such a “primary key” as has been described here. Developers would then be stuck with generating identifiers inside their apps, so tracking a single user inside a single app would work but it would be impossible to aggregate the data across multiple apps, or the same app across multiple devices.
Unless, of course, the developer can coerce the user into associating all of those different identifiers with some shared identifier, such as a network service username. But how do you get users to sign up for network services? By ensuring that the service has value for the user. Look at game networks that do associate user activity across multiple apps, like OpenFeint and Game Center: they work because players like seeing what games their friends are playing, and sharing their achievements with other people.
The conclusion is, in the no-device-identifier world, it’s still possible to aggregate user behaviour, but only if you exchange that ability for something that the user values. Seems like a fair deal.
[*] My “don’t be a dick” guide to data privacy takes into account the fact that people like sharing information via online services such as Facebook, foursquare etc. but that they want to do so on their own terms. It goes like this: you have no right to anything except what the user told you. You have no right to share anything except what the user told you to share; they will tell you who you may share it with, and can change their minds. The user has a right to know what they’re telling you and how you’re sharing it.
[**] Given UK-sized postal zones, your surname and postcode are sufficient to uniquely identify you. Probably your birthday and postcode would also work. It doesn’t take much information to uniquely identify someone, anyway.