Researchers Store Full Computer Operating System on DNA

Proving that everything old is new again, researchers are now storing data on the oldest information storage solution there is: DNA. A pair of researchers at Columbia University and the New York Genome Center (NYGC) have come up with a technique to store massive amounts of data on DNA. The result, according to study coauthor Yaniv Erlich, is the "highest-density data-storage device ever created."

The researchers say DNA is the perfect storage medium: it's ultra-compact and can last hundreds of thousands of years if kept cool and dry, according to a news release from Columbia.

"DNA won't degrade over time like cassette tapes and CDs, and it won't become obsolete—if it does, we have bigger problems," Erlich, a computer science professor at Columbia Engineering, said in a statement.

Erlich and his colleague Dina Zielinski, an associate scientist at NYGC, successfully encoded six files into DNA: a full computer operating system, the 1895 French film Arrival of a train at La Ciotat, a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.

They first compressed the files into a master file and split the data into short strings of binary code, made up of ones and zeros. Next, "using an erasure-correcting algorithm called fountain codes, they randomly packaged the strings into so-called droplets, and mapped the ones and zeros in each droplet to the four nucleotide bases in DNA: A, G, C and T," according to the release.
They wound up with a digital list of 72,000 DNA strands and send it in a text file to a San Francisco DNA synthesis startup called Twist Bioscience, which specializes in turning digital data into biological data.

"Two weeks later, they received a vial holding a speck of DNA molecules," the school wrote. "To retrieve their files, they used modern sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary. They recovered their files with zero errors."
The researchers say this strategy allows for 215 petabytes of data to be stored on a single gram of DNA. This technique comes at a high cost, however, so don't expect it to go mainstream any time soon. The researchers spent $7,000 to synthesize the 2MB of data and another $2,000 to read it. For more on the technique, check out the video below.