If you could pick an image to be preserved for thousands of years, what would it be? A picture of your family, an endangered landscape, a page of poetry, or a snapshot that sends a message to the future?
Researchers from the Molecular Information Systems Lab at the University of Washington and Microsoft are looking to collect 10,000 original images from around the world to preserve them indefinitely in synthetic DNA manufactured by Twist Bioscience. DNA holds promise as a revolutionary storage medium that lasts much longer and is many orders of magnitude denser than current technologies.
The team has already encoded important compositions in DNA molecules, including The Universal Declaration of Human Rights, the top 100 books of Project Gutenberg, songs from the Montreux Jazz Festival and an OK Go video.
The #MemoriesInDNA Project invites the public to submit original photographs that they’d like to see preserved in DNA for millennia. The images - which can be uploaded at the project website - will be encoded in synthetic DNA and made available to researchers worldwide. The researchers also are encouraging people to share their images on social media with the hashtag #MemoriesInDNA and include a story about why the photograph or video is important to them.
"It’s your turn to show us what should be preserved in DNA forever," said Luis Ceze , professor in the UW’s Paul G. Allen School of Computer Science & Engineering. "We want people to go out and take a picture of something that they want the world to remember - it’s a fun opportunity to send a message to future generations and help our research in the process."
DNA data storage has emerged as a potential solution to bridge the growing gap between the amount of digital data generated today - by everything from commercial video to space imagery to medical records - and our ability to affordably and efficiently store that data.
Unlike data centers, which require acres of land and account for nearly 2 percent of the total electricity consumption in the United States, DNA molecules can store information millions of times more compactly. The basic process converts the strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences - adenine, guanine, cytosine and thymine. It employs synthetic DNA molecules created in a lab, not living DNA.
The team of UW computer scientists and electrical engineers, in collaboration with Microsoft researchers and working with Twist Bioscience, holds the current world record for the amount of data stored in DNA. So far they have been able to encode photographic images and video in DNA and retrieve and convert those individual molecular "files" back into digital data.
Their next challenge involves exploring how to perform meaningful data processing directly in DNA - without having to convert the images back into their electronic form.
"Let’s suppose you have a trillion images encoded in DNA and want to find all the photographs that have a red car in them, or to find out whether a person’s face exists in those images," said Ceze. "We want to be able to do that information processing in DNA directly - to search in a smart way and make the molecules themselves carry out that computer vision work."
The team will encode approximately 10,000 of the crowdsourced images in manufactured snippets of DNA. The researchers’ approach to searching images directly in DNA relies on the fact that certain nucleotides stick to others - A binds to T and C binds to G.
They can introduce strips of DNA into the solution that contains a coded "query" - essentially, a string of complementary DNA that causes all photographs with a red car or certain facial features or whatever meets the criteria of the query to bind to it. By attaching magnetic nanoparticles to the query DNA, they can use a magnet to pull out all the similar images that have stuck to it.
"It is thrilling to bring computer science and molecular biology together in this project," said Microsoft senior researcher and collaborator Karin Strauss. "There has been amazing progress recently in both areas and, when combined, they can be very powerful in tackling problems created by the massive amounts of data we’ve been generating."
"Having a set of diverse images from around the world will help us invent new ways to make molecules work with each other to carry out these computations directly," said Microsoft partner architect and collaborator Douglas Carmean.
The team will employ machine learning to devise methods to map and encode all the visual features contained in a photograph - such as colors, curves, lines and objects - in DNA. The main challenge is doing that in a way that allows scientists to extract similar things and perform meaningful data processing.
"We will use neural networks to explore ways to classify visual patterns in the images and video that we encode in DNA," said Georg Seelig , UW associate professor of electrical engineering and in the Allen School. "For example, are there more red cars than blue cars in a photograph? Or are there people riding bicycles?"
"With proof-of-concept achieved for DNA as a digital data storage media, we are working to drive down the cost of synthesizing DNA to enable its potential as a widely-available commercial solution for the growing body of precious data in digital format, such as archival data, financial and health record backups, and all long-term data retention where current media is not practical," said Emily M. Leproust, CEO of Twist Bioscience. "MemoriesInDNA is a fabulous project to showcase the technological, scientific and cultural importance of DNA worldwide and we look forward to our role in this historic event."
#MemoriesInDNA will provide an important library of images to be encoded in a separately funded project supported by the Defense Advanced Research Projects Agency (DARPA) Molecular Informatics program. UW was recently awarded $6.3M to accelerate the pace at which data can be encoded in DNA, and to develop new capabilities to process this data through image search and classification. The work will build the foundation on which UW can advance its next-generation work in molecular information processing.
: To be included in the DNA image collection, photographs cannot be copyrighted by any other party and must be free of violent or inappropriate content. The image dataset will be preserved in DNA indefinitely and shared with researchers worldwide. For more details about how to upload and share images, visit the #MemoriesInDNA Project website.