Since time immemorial, mankind has wanted to share and use information for later use. First, it was through the caveman paintings and symbols. Then we invented the alphabets, ideograms, numbers and other symbols. Using these, books were written and stored for future generations, in palm leaves, papyrus sheets or paper. The invention of printing brought the Gutenberg revolution, making multiple copies easily and spreading education to millions of people.
Printed books occupy space. Libraries and archives are bursting at the seams. Enter the computer age and digitization using the binary code of combining zeros and ones (0,1) for alphabets and other such symbols, and reading them using the on-off electrical signals, which has made electronic storage possible, cutting down the size and space for ‘hard copies’. Integrated circuits, processors and related electronic wizardry have shrunk the size of computers and storage devices from room-size to finger nail size.
But even so, the amount of information storable in a given ‘hard drive’ (from a printed book to an Amazon or Kindle e-book, or the Encyclopaedia Britannica to Google) is growing exponentially. “That means the cost of storage is rising but our budgets are not”, as Dr. Nick Goldman of the European Bioinformatics Institute at Hinxton, UK told The Economist (in its January 26, 2013 issue). Goldman (together with 4 colleagues at Hinxton and 2 from Agilent Technologies, California, U.S.) decided to use DNA (yes, the molecule which stores the code to make life possible) as the information storage device, rather than electronics. Their paper titled “Towards practical, high-capacity, low maintenance information storage in synthesized DNA” has just been published in the journal Nature two weeks ago (doi:10.1038/nature 11875).
Indeed the question should be ‘why not DNA”. It is a long chain, consisting of 4 alphabets (chemical units called bases and referred to as A, G, C and T) put together in a string of sequence — similar to what the English language does with its 26 alphabets and punctuation marks, or digital computers with the combination of zeros and ones in chosen sequences. DNA has been used since life was born over 2 billion years ago to store and transfer information right through evolution. It is small in size — the entire information content of a human is stored in a 3 billion long sequence of A, G, C and T, and packed into the nucleus of a cell smaller than a micron (thousandth of a millimetre). It is stable and has an admirable shelf life. People have isolated DNA from the bones of dinosaurs dead about 65 millions ago, read the sequence of bases in it and understood much information about the animal. The animal (shall we say the ‘host’ of the DNA) is long since dead but the information lives on.
DNA is thus a long-lived, stable and easily synthesized storage hard drive. While the current electronic storage devices require active and continued maintenance and regular transferring between storage media (punched cards to magnetic tapes to floppy disks to CD…), DNA based storage needs no active maintenance. Just store in a cool, dark and dry place!
The Goldman group is not the first one to think of DNA as a storage device. Dr E.B. Baum tried building an associative memory vastly larger than the brain in 1995, Dr C.T. Clelland and others ‘hid’ messages in DNA microdots in 1999, JPL Cox wrote in 2001 on long-term data storage in DNA, Allenberg and Rotstein came up with a coding method for archiving text, images and music characters in DNA, and in 2012 Church, Gao and Kosuri have discussed the next-generation digital information storage in DNA.