myscience.org › news › news 2012 › Kamalika Chaudhuri: Quantifying the Price of Privacy

Kamalika Chaudhuri: Quantifying the Price of Privacy

20 June 2012

The data avalanche brought about by the digital revolution has made it possible to harness vast datasets for everything from statistical analysis to teaching machines to recognize patterns and respond in 'intelligent' ways. But much of this data comes from humans, and many of those humans expect their data to remain private. Preserving this privacy, however, is not always easy, says University of California, San Diego Computer Science Professor Kamalika Chaudhuri. "Suppose you have some sensitive data, such as genomic data that you've gathered from patients, and now you want to compute some statistics on that data to develop some kind of prediction algorithm," she explains. "For example, you could be analyzing certain features of patients in order to predict if they might develop a certain disease. "With most data-based research, so long as the patients' names and addresses and some other identifying information are removed, the data is considered private," adds Chaudhuri, an affiliate of the UC San Diego division of the California Institute for Tele and Information Technology (Calit2), where she worked as a researcher for Calit2's Information Theory and Applications Center. "But with datasets with a lot of features, this is not the case," she notes, "particularly when they are based on small sample-sizes." Privacy researchers have discovered that with a little prior knowledge it is possible to 'reverse-engineer' the statistics obtained from such data to determine who the patients are, thus compromising their privacy.