Anonymising personal data ’not enough to protect privacy’, shows UCLouvain’s new study

  • Current methods for anonymising data leave individuals at risk of being re-identified, according to new research from University of Louvain (UCLouvain) and Imperial College London
  • This research is published in Nature Communications


Nature article:  https://www.nature.com/article­s/s41467-0­19-10933-3

Luc Rocher , researcher at the mathematical engineering department of University of Louvain (UCLouvain) : mobile on request , luc.rocher [at] uclouvain (p) be
Yves-Alexandre de Montjoye
, assistant professor at department of computing, and data science Institute of Imperial College London : yvesalexandre [at] demontjoye (p) com

With the first large fines for breaching EU General Data Protection Regulation (GDPR) regulations upon us, researchers from University of Louvain and Imperial College London have shown how even anonymised datasets can be traced back to individuals using machine learning.

The researchers say their paper, published today , demonstrates that allowing data to be used - to train AI algorithms, for example - while preserving people’s privacy, requires much more than simply adding noise, sampling datasets, and other de-identification techniques. They have also published a demonstration tool  that allows people to understand just how likely they are to be traced , even if the dataset they are in is anonymised and just a small fraction of it shared.

Companies and governments both routinely collect and use our personal data . Our data and the way it’s used is protected under relevant laws like GDPR or the US’s California Consumer Privacy Act (CCPA).

Data is ’sampled’ and anonymised, which includes stripping the data of identifying characteristics like names and email addresses, so that individuals cannot, in theory, be identified. After this process, the data’s no longer subject to data protection regulations, so it can be freely used and sold to third parties like advertising companies and data brokers.

The new research shows that once bought, the data can often be reverse engineered using machine learning to re-identify individuals, despite the anonymisation techniques.  This could