How hard is it to ’de-anonymize’ cellphone data?

A new formula that characterizes the privacy afforded by large, aggregate data sets may be discouraging, but could help sharpen policy discussion. The proliferation of sensor-studded cellphones could lead to a wealth of data with socially useful applications - in urban planning , epidemiology, operations research and emergency preparedness, among other things. Of course, before being released to researchers, the data would have to be stripped of identifying information. But how hard could it be to protect the identity of one unnamed cellphone user in a data set of hundreds of thousands or even millions? According to a paper appearing this week in Scientific Reports , harder than you might think. Researchers at MIT and the Université Catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them. In other words, to extract the complete location information for a single person from an "anonymized" data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts.
account creation

TO READ THIS ARTICLE, CREATE YOUR ACCOUNT

And extend your reading, free of charge and with no commitment.



Your Benefits

  • Access to all content
  • Receive newsmails for news and jobs
  • Post ads

myScience