Harnessing DNA for data storage
Posted on Jan 30, 2013 in Science
For most of the time that life has existed, DNA has served one single purpose – the storage of the information needed to create a new organism. Using only four “letters” in its alphabet, DNA encodes the blueprints for life. Now, however, we have begun harnessing the tremendous data-storing capabilities of DNA to store whatever we desire.
Scientists from the European Bioinformatics Institute in the United Kingdom have developed a method to reliably and effectively encode arbitrary information effectively into strands of DNA. Unlike previous methods, this new method is not error-prone and has the potential to be cost effective. To demonstrate this method’s effectiveness, the scientists stored the entirety of: Shakespeare’s sonnets; a copy of Watson and Crick’s landmark paper on DNA structure; a photograph of the European Bioinformatics Institute; a clip from Martin Luther King Jr.’s famous “I have a dream” speech; and the source file of a program used in the experiment.
Previous experiments directed at storing information as DNA have used a binary system; A and G are 0, and C and T are 1. Thus data was represented exactly as it would be on a computer. This method, however, did not address the problem of long stretches of a single base, such as GGGGGGG, which are often not sequenced properly.
The researchers at the European Bioinformatics Institute solved this problem by breaking away from the traditional computing binary language to adopt a trinary (base 3) storage system. Though DNA can technically store information in a quaternary (base 4) system, one of the bases, in this case G, was used as a redundant segment that broke up long stretches of the same base.
After files were converted to the trinary system, they were written into DNA. Instead of using a long single strand of DNA for each file, as one would naively expect, the files were broken into smaller, easier-to-synthesize chunks of DNA. In addition to the information, each strand also included: parity information for error detection; a file ID; and data that indicated which part of the overall file the strand encoded. The data carried by the strands also overlapped, which provided redundancy.
This method worked almost flawlessly. Only one error was encountered, but the missing information was reconstructed because of the redundancy the storage method.
The beauty of DNA storage, of course, is density. Whereas conventional data storage methods, such as hard drives, can perhaps fit two or three terabytes onto a 3.5-inch diameter plate, DNA can easily fit over 2 petabytes into a single gram (1 petabyte= 1024 terabytes). Under the right conditions, DNA molecules can also survive for millennia, making it the ultimate long-term storage method.
There is one drawback to DNA storage. Whereas hard drive read speeds can reach the order of hundreds of megabytes per second, it takes hours, if not days, to sequence and reconstruct data from DNA strands. Though impractical for day-to-day use, large data stores that are infrequently accessed would benefit tremendously. The petabytes of data produced by the Large Hadron Collider, for example, might be unexamined for months or years. It would be much more economical to store such data in DNA.
Researchers see the possibility of this storage method becoming economically viable in the near future. Perhaps, soon, we will start backing up data to DNA and instead of hard drives and magnetic tape, we could carry all of our information in a few grams of DNA. However, don’t expect to have petabytes of storage in your smartphones any time soon.
Tianjiao Zhang
Staff Writer
tzhang@uab.edu



