AI algorithms can support medical personnel in diagnosing illnesses. However, to train these algorithms, a precious good warranting careful protection must be accessed: medical data. A team of researchers at the Technical University of Munich (TUM) has developed a technology that ensures that patients’ personal data are protected in the training of algorithms. It is now being used for the first time in an algorithm that identifies pneumonia in paediatric x-ray images.
Digital medicine is opening up entirely new possibilities. For example, it can detect tumors at an early stage. But the effectiveness of new AI algorithms depends on the quantity and quality of the data used to train them.
To maximize the data pool, it is customary to share patient data between clinics by sending copies of databases to the clinics where the algorithm is being trained. For data protection purposes, the material usually undergoes anonymization and pseudonymization processes - a procedure that has also come in for criticism. "These processes have often proven inadequate in terms of protecting patients’ health data," says Daniel Rueckert, Alexander von Humboldt Professor of Artificial Intelligence in Healthcare and Medicine at TUM.
To address this problem, an interdisciplinary team at TUM has worked with researchers at Imperial College London and the non-profit OpenMined to develop a unique combination of AI-based diagnostic processes for radiological image data that safeguard data privacy. In a paper published in Nature Machine Intelligence, the team has now presented a successful application: a deep learning algorithm that helps to classify pneumonia conditions in x-rays of children.
"We have tested our models against specialized radiologists. In some cases the models showed comparable or better accuracy in diagnosing various types of pneumonia in children," says Prof. Marcus R. Makowski, the Director of the Department of Diagnostic and Interventional Radiology at the Klinikum rechts der Isar of TUM.
"To keep patient data safe, it should never leave the clinic where it is collected," says project leader and first author Georgios Kaissis of the TUM Institute of Medical Informatics, Statistics and Epidemiology. "For our algorithm we used federated learning, in which the deep learning algorithm is shared - and not the data. Our models were trained in the various hospitals using the local data and then returned to us. Thus, the data owners did not have to share their data and retained complete control," says first author Alexander Ziller, a researcher at the Institute of Radiology.
To prevent identification of institutions where the algorithm was trained, the team applied another technique: secure aggregation. "We combined the algorithms in encrypted form and only decrypted them after they were trained with the data of all participating institutions," says Kaissis. And to ensure ’differential privacy’ - i.e. to prevent individual patient data from being filtered out of the data records - the researchers used a third technique when training the algorithm. "Ultimately, statistical correlations can be extracted from the data records, but not the contributions of individual persons," says Kaissis.
"Our methods have been applied in other studies," says Daniel Rueckert. "But we have not yet seen large-scale studies using real clinical data. Through the targeted development of technologies and the cooperation between specialists in informatics and radiology, we have succeeded in training models that deliver precise results while meeting high standards of data protection and privacy."
Rickmer Braren, the deputy director of the Department of Diagnostic and Interventional Radiology notes: "It is often claimed that data protection and the utilization of data must always be in conflict. But we are now proving that this does not have to be true." The scientists add that their method can be applied to other medical data, and not just x-rays. For example speech and text.
The combination of the latest data protection processes will also facilitate cooperation between institutions, as the team showed in a paper published in Nature Machine Intelligence in 2020. Their privacy-preserving AI method can overcome ethical, legal and political obstacles - thus paving the way for widespread use of AI, says Braren. And this is enormously important for research into rare diseases.
The scientists are convinced that their technology, by safeguarding the private sphere of patients, can make an important contribution to the advancement of digital medicine. "To train good AI algorithms, we need good data," says Kaissis. "And we can only obtain these data by properly protecting patient privacy," adds Rueckert. "This shows that, with data protection, we can do much more for the advancement knowledge than many people think."
Kaissis, GA; Ziller A, Makowski, MR.; Rueckert, D.; Braren, R. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence (2021). DOI: 10.1038/s42256-021-00337-8
Kaissis, GA.; Makowski, MR.; Rueckert, D.; Braren, R. et al. Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence 2, 305-311 (2020). DOI: 10.1038/s42256-020-0186-1