Data storage & DNA
06 July 2018 •
By: Jo Technology
In 2017, we produced more data in one year, than all the data ever produced since the beginning of humanity. We’ve even had to invent terms like zettabyte - that’s one sextillion bytes - just to be able to discuss the amount of data we’re creating. As you can imagine, it all requires a lot of storage. That’s why tech giants like Google and Facebook have to maintain enormous server farms, otherwise known as data centres, all over the world. The warehouse-like spaces are filled from top to bottom with aisles of servers that are very costly to run, both financially and environmentally.
It doesn’t look like our data production is set to slow down any time soon and begs the very real question of how we’re going shoulder the load. Also, as the years slip by and data needs to be preserved for future access, how can we guarantee that nothing will be lost when our current forms of data storage (from CDs, to hard disk drives, to flash storage) all degrade over time?
When faced with a challenge, it wouldn’t be the first time humankind has turned to nature for inspiration and guidance. It was American cyberneticist Norbert Wiener and Russian physicist Mikhail Neiman who first presented the idea of the radical miniaturisation of the recording, storing and retrieving of digital information by using DNA in 1964. The idea of storing digital data in DNA might initially seem like an enormous leap in reasoning, but doesn’t seem quite so inconceivable when you think of DNA as the hard disk drive of a living organism. In the same way that a computer uses binary code - sequences and patterns of 1s and 0s - to encode data, the microscopic thread-like chains of DNA also contain coded chemical instructions used in the growth, development, functioning and reproduction of all known living organisms. There are four nucleotides (a kind of organic molecule) in DNA and their names have been shortened to A, G, T, and C. The key has been to devise a system to translate binary code into a sequence that uses the four nucleotides of DNA.
Digital data was first successfully stored in DNA in 2012 by American geneticist, molecular engineer, and chemist, George Church and his colleagues at Harvard University. They encoded a 53 000 word book and other kinds of file formats in thousands of small pieces of DNA. Their system used “simple code where bits were mapped one-to-one with bases”, but since then Nick Goldman and his team at the European Bioinformatics Institute, have created a more efficient and less error-prone way to store digital data in DNA.
To test their method they encoded, amongst other things, all 154 of Shakespeare's sonnets and a twenty-six-second audio clip of the "I Have a Dream" speech by Martin Luther King. They then sent the sequences off to a company which synthesised the DNA. The DNA was then sent back to Goldman via courier in a test tube. He actually thought that the process had not worked as they had sent back, what seemed to be an empty test tube. The DNA was actually the dust-like smudge at the bottom of the tube. They then had the DNA sequenced and converted back to binary code. The final data was bit for bit perfect.
The obvious advantage of DNA data storage is the fact that a single gram of DNA can store up to 215 million gigabytes of data. “All the information in the world could be encoded and stored in DNA, and it would fit in the back of an SUV,” says Nick Goldman. But besides from being ultra compact, DNA is also extremely durable. Kept in the right conditions, namely a cool, dark, dry place, it can last hundreds of thousands of years, and, unlike cassettes and CDs, DNA will never become obsolete.
The only reason we can even contemplate DNA as the data storage of the future is the fact that we’ve made incredible progress with regards to our ability to synthesize and sequence DNA. Today, the human genome can be sequenced in a matter of hours, whereas it used to take years. The real challenge, as Goldman explained in a talk he gave in 2015 at the World Economic Forum, is with synthesizing or “writing” the DNA. “Humans are not good yet at writing the first copy of DNA.” It’s an extremely complex, long and expensive process. There currently isn’t enough money on Earth to store all our data on DNA.
While it might not be a reality yet, by the year 2020 Microsoft aims to have all their data stored in DNA. Once the price drops and the ease and accessibility of the whole process are improved, DNA storage may one day be available for all. Perhaps then we’ll look back on our current storage systems and scoff the way we do looking back on the large and cumbersome machines that were our very first computers.
By: Jo Technology