MP3 Files Written as DNA — Data of Nearly Entire Library of Congress Stored in a Single Gram

Written by on January 27, 2013 in Technology with 0 Comments

MP3 files written as DNA with storage density of 2.2 petabytes per gram

Researchers used trinary to take advantage of DNA’s four bases.

 

DNA-storage_2460302bBy  – ArsTechnica

It’s easy to get excited about the idea of encoding information in single molecules, which seems to be the ultimate end of the miniaturization that has been driving the electronics industry. But it’s also easy to forget that we’ve been beaten there—by a few billion years. The chemical information present in biomolecules was critical to the origin of life and probably dates back to whatever interesting chemical reactions preceded it.

It’s only within the past few decades, however, that humans have learned to speak DNA. Even then, it took a while to develop the technology needed to synthesize and determine the sequence of large populations of molecules. But we’re there now, and people have started experimenting with putting binary data in biological form. Now, a new study has confirmed the flexibility of the approach by encoding everything from an MP3 to the decoding algorithm into fragments of DNA. The cost analysis done by the authors suggest that the technology may soon be suitable for decade-scale storage, provided current trends continue.

Trinary encoding

Computer data is in binary, while each location in a DNA molecule can hold any one of four bases (A, T, C, and G). Rather than using all that extra information capacity, however, the authors used it to avoid a technical problem. Stretches of a single type of base (say, TTTTT) are often not sequenced properly by current techniques—in fact, this was the biggest source of errors in the previous DNA data storage effort. So for this new encoding, they used one of the bases to break up long runs of any of the other three.

(To understand how this could work practically, let’s say the A, T, and C encoded information, while G represents “more of the same.” If you had a run of four A’s, you could represent it as AAGA. But since the G doesn’t encode for anything in particular, TTGT can be used to represent four T’s. The authors’ system was more complicated, but did manage to keep extended single-base runs from happening.)

That leaves three bases to encode information, so the authors converted their information into trinary. In all, they encoded a large number of works: all 154 Shakespeare sonnets, a PDF of a scientific paper, a photograph of the lab some of them work in, and an MP3 of part of Martin Luther King’s “I have a dream” speech. For good measure, they also threw in the algorithm they use for converting binary data into trinary.

Once in trinary, the results were encoded into the error-avoiding DNA code described above. The resulting sequence was then broken into chunks that were easy to synthesize. Each chunk came with parity information (for error correction), a short file ID, and some data that indicates the offset within the file (so, for example, that the sequence holds digits 500-600). To provide an added level of data security, 100-bases-long DNA inserts were staggered by 25 bases so that consecutive fragments had a 75-base overlap. Thus, many sections of the file were carried by four different DNA molecules.

And it all worked brilliantly—mostly. For most of the files, the authors’ sequencing and analysis protocol could reconstruct an error-free version of the file without any intervention. One, however, ended up with two 25-base-long gaps, presumably resulting from a particular sequence that is very difficult to synthesize. Based on parity and other data, they were able to reconstruct the contents of the gaps, but understanding why things went wrong in the first place would be critical to understanding how well suited this method is to long-term archiving of data.

Long-term storage

In general, though, the DNA was very robust. The authors simply dried it out before shipping it to a lab in Germany (with a layover in the UK), where it was decoded. Careful storage in a cold, dry location could keep it viable for much, much longer. The authors estimate their storage density was about 2.2 Petabytes per gram, and that it included enough DNA to recover the data about ten additional times.

Read the full article

GaiamTV_Bnr_600x76

Tags: , , , , , ,

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

Subscribe via RSS Feed Connect on YouTube
Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *



FAIR USE NOTICE. Many of the stories on this site contain copyrighted material whose use has not been specifically authorized by the copyright owner. We are making this material available in an effort to advance the understanding of environmental issues, human rights, economic and political democracy, and issues of social justice. We believe this constitutes a 'fair use' of the copyrighted material as provided for in Section 107 of the US Copyright Law which contains a list of the various purposes for which the reproduction of a particular work may be considered fair, such as criticism, comment, news reporting, teaching, scholarship, and research. If you wish to use such copyrighted material for purposes of your own that go beyond 'fair use'...you must obtain permission from the copyright owner.

The information on this site is provided for educational and entertainment purposes only. It is not intended as a substitute for professional advice of any kind. Conscious Life News assumes no responsibility for the use or misuse of this material. Your use of this website indicates your agreement to these terms.

Paid advertising on Conscious Life News may not represent the views and opinions of this website and its contributors. No endorsement of products and services advertised is either expressed or implied.
Top
Close
Please support ConsciousLifeNews
Like us on Facebook