Netflix-style algorithm builds blueprint of cancer genomes

The science behind your Netflix viewing habits could soon be used to guide doctors in managing cancer, according to new research co-led by UCL scientists and funded by Cancer Research UK and Cancer Grand Challenges.

In the study an international team of scientists used artificial intelligence (AI) to investigate and categorise the size and scale of DNA changes across the genome - a cell’s complete genetic code - when cancer starts and grows.

Using AI, the scientists have identified 21 common faults that occur to the structure, order and number of copies of DNA present when cancer starts and grows. These common faults, called copy number signatures, could help guide doctors to treatments which reflect the characteristics of the tumour. 

When you watch Netflix, data are generated about the type of films and TV series you watch, how frequently you watch them and whether you give them a "thumbs up" or "thumbs down". Netflix uses an algorithm to analyse this massive amount of data, find patterns in the content you watch and then recommends new films and TV series when you scroll through Netflix. 

A team of researchers led by Dr Nischalan Pillay (UCL Cancer Institute) and Dr Ludmil Alexandrov (University of California, San Diego), built a similar algorithm which can sift through thousands of lines of genomic data and pick out common patterns in how the chromosomes organise and arrange themselves. The algorithm can then categorise the patterns that emerge and help scientists establish the types of faults* that can occur in cancer. 

Using the algorithm, the scientists looked for patterns in the fully sequenced genomes from 9,873 patients with 33 different types of cancer. The algorithm identified 21 common faults to the structure and number of chromosomes in tumours and categorised them into different "genres" called copy number signatures. 

The 21 copy number signatures will now be used to create a blueprint that researchers can use to assess how aggressive the cancer will be, find its weak spots and design new treatments for it.

Dr Ludmil Alexandrov, Associate Professor at UC San Diego and co-lead author of the study, said: "Cancer is a complex disease, but we’ve demonstrated that there are remarkable similarities in the changes to chromosomes that happen when it starts and how it grows. 

"Just as Netflix can predict which shows you’ll choose to binge watch next, we believe that we will be able to predict how your cancer is likely to behave, based on the changes its genome has previously experienced. 

"We want to get to the point where doctors can look at a patient’s fully sequenced tumour and match the key features of the tumour against our blueprint for genomic faults. Armed with that information, we believe that doctors will be able to offer better and more personalised cancer treatment in the future."

The scientists previously studied how these large-scale genomic faults occur in sarcoma, and wanted to find ways to study these changes across different types of cancer. 

Using software called SigProfilerExtractor, which was developed by Dr Alexandrov, the algorithm uses complex maths to scan sequencing data from cancer patients and identify common patterns in how the chromosomes are reorganised in different types of cancer.   

The scientists further investigated the copy number signatures which most strongly affected outcomes for cancer patients. Of the 21 signatures identified by the algorithm, the scientists found that tumours where the chromosomes have shattered and reformed (known as chromothripsis) were associated with the worst survival outcomes. For example, the study found that patients with glioblastoma, an aggressive type of brain tumour, had worse survival outcomes if their tumour had undergone chromothripsis. On average, glioblastoma patients without chromothripsis survived 6 months longer than glioblastoma patients whose tumours had chromothripsis.

The scientists hope that they will be able to refine the algorithm to enable doctors to find out how your cancer is likely to behave, based on the genetic traits it acquired when it started and the genetic changes it picks up as it grows. 

Co-lead author, Dr Nischalan Pillay, Associate Professor in Sarcoma and Genomics at UCL, said: "To stay one step ahead of cancer, we need to anticipate how it adapts and changes. 

"Mutations are the key drivers of cancer, but a lot of our understanding is focused on changes to individual genes in cancer. We’ve been missing the bigger picture of how vast swathes of genes can be copied, moved around or deleted without catastrophic consequences for the tumour.

"Understanding how these events arise will help us regain an advantage over cancer. Thanks to advances in genome sequencing, we can now see these changes play out across different cancer types and figure out how to respond effectively to them."

The scientists have made SigProfilerExtractor and other software tools used in the study freely available to other scientists, so that they can use the algorithm to build their own Netflix-style libraries of chromosome changes from DNA, based on from data obtained from sequencing tumours. 

First author, Dr Christopher Steele (UCL Cancer Institute), added: "We believe that making these powerful computing tools free to other scientists will accelerate progress towards a personalised cancer blueprint for patients, giving them the best chances of survival."

Michelle Mitchell, Chief Executive of Cancer Research UK, said: "Cancer Research UK has been at the forefront of understanding the genetics behind cancer. As we celebrate our 20th anniversary, it’s amazing news that we can build a blueprint across multiple tumour types that might help researchers predict how a cancer will behave and show ways in how we can tackle it with more precise treatments. 

"This research is another brilliant result for Cancer Grand Challenges, which was set up to tackle some of biggest challenges in cancer research."

 

 

  • University College London, Gower Street, London, WC1E 6BT (0) 20 7679 2000