Microsoft, UW demonstrate first fully automated DNA data storage

news.microsoft.com

169 points by benryon 5 years ago

iso1337 5 years ago

From the paper:

“Our system’s write-to-read latency is approximately 21 h. The majority of this time is taken by synthesis, viz., approximately 305 s per base, or 8.4 h to synthesize a 99-mer payload and 12 h to cleave and deprotect the oligonucleotides at room temperature. After synthesis, preparation takes an additional 30 min, and nanopore reading and online decoding take 6 min.”

Also the amount of data that can be written is very small in general for oligosynthesis. The most high throughout methods are microarrays by Agilent and others (not used in this paper). You can buy about 32 megabits for $6000 (http://www.customarrayinc.com/oligos_main.htm). The actual cost of synthesis is maybe $2000 as a guess.

So currently DNA storage would be good for small datasets that you would like to store for a long time. Physical density is useless most of the time if you can’t efficiently generate a lot of data to begin with.

We would need several orders of magnitude increase in the write capacity, which is slowly being worked on. Typically people would like to compare synthesis costs with Moore’s law, using Carlson curves ( http://www.synthesis.cc/synthesis/2016/03/on_dna_and_transis... ).

However there hasn’t been as much progress in synthesis as in sequencing. Why? My theory is that Theres not a very big market for DNA synthesis, so the big investments needed haven’t really been there. Maybe storage on dna could be that market, but it would need to show quick and easy wins (eg stepping stones of practicality like what early integrated circuits had).

toufka 5 years ago

From a biology perspective, the value in DNA synthesis has plateaus. (I’m curious about analogies to other commodities?)
If you can build small runs (<50base pairs) you can make small mutations to dna, or read particular sections of dna. If you can make dna larger than the average protein (~2000bp) you can invent new proteins from scratch rather than modify existing protein sequences. If you can make dna longer than a plasmid (~10,000bp) you can creatively invent a minimal viable replicable and deliverable unit (a plasmid (bacterial virus)). If you can do millions if base pairs you get to chromosomes and can invent Eucaryotic-transmissable storage.
But until you leap those plateaus, you can likely saturate the intervening market. So even if there’s massive pent up theoretical demand for the wholesale invention of genes, there’s no way to really demonstrate it in the current market.
This inability to estimate demand may make it a tricky spot to invest in.
- iso1337 5 years ago
  
  The issue is also we don't know what to write. I'm being a bit extreme here, but not being able to do rational design limits things to mostly 1) build a lot of variants, using some educated guessing, 2) test those variants, 3) plug those results into some model and predict what to build next.
  So the value of each individual variant is relatively low, since you don't know if it will work or not. Plus a lot of other investments have to be made into the toolchain (developing assays for the thing you actually care about is not easy).
  In your plateaus, the possibility space explodes with each step function. We are already at the first rung of the ladder of building many 50bp variants. Generally, you'd probably want to be able to conduct an experimental cycle with 1000s of variants. At the upper end, you'd want to pay $5000-10000 for each cycle. There is demand for making protein-length sequences, but the prices the market is willing to pay is probably closer to $10/gene (fully customizable 2kbp) rather than the current price.
  Another thing is that not all sequences are created equally. Some sequences will just not synthesize well (maybe there's secondary structure), so to limit costs one ought to limit the number of retries per sequence.
- waynecochran 5 years ago
  
  Can the replication process that occurs when a cell divides be used for data copying? It seems like this, if doable, would be a huge boon for DNA storage.
  
  iso1337 5 years ago
  
  It’s already used: look up Polymerase chain reaction.
  Copying isn’t as interesting as de novo synthesis, being able to specify an arbitrary sequence and have it synthesized.

itchyjunk 5 years ago

I find a few different articles about digital data storage in DNA but they don't seem to tell me why more data can be stored in DNA than classical medium? Maybe I am not phrasing the question right.

""DNA can store digital information in a space that is orders of magnitude smaller than datacenters use today.""

Would this DNA also exist in the same conditions as the data center or does it need more things?

mbreese 5 years ago

Not only is DNA chemically stable and can last for years, has internal redundancy, but it is remarkably compact. DNA is an evolutionarily optimal data storage medium.
To put it into context, each cell in your body contains 6 billion base pairs of data (two copies of your genome). Each base is one of 4 bits, so that’s 4^6000000000 of data in each cell. Your body has ~37 trillion cells [1]. A person is about the size of a rack (well, maybe 10-15U by volume), so that’s 3.7e13 * 4^6e9 bits per rack.
A petabyte is 8e15 bits.
That’s a lot of data storage capacity in a small space. Moreover, there is the potential for introducing more synthetic bases to increase the 4 to 6.
https://www.ncbi.nlm.nih.gov/m/pubmed/23829164/
- hn_throwaway_99 5 years ago
  
  Minor correction: each base pair is one of 4 values (A, T, G, C), so each base pair is equivalent to 2 bits of data, which gives a number half of what you quoted.
  
  mbreese 5 years ago
  
  Yes, I misworded it and gave the wrong numbers. I gave the number of possible combinations in a genome, which is 4^6e9. That’s obviously not the number to compare (and a slightly embarrassing mistake).
  With 6 gigabases per cell and 2bits per base, the storage capacity is 12 gigabits (1.5 gigabytes) per cell. And with 37 trillion cells (3.72e13) in a human body, that’s 3.72e13 cells * 1.5e9 bytes per cell which is 5.58e22 bytes per person or 55 zettabytes. This seems like a more reasonable number.
  
  faissaloo 5 years ago
  
  8 if you include artificial base pairs (P, Z, B, S)
- dooglius 5 years ago
  
  The number of bits is the logarithm of possible cases; there are only two bits needed to describe the four bases, and only 12 billion bits needed to describe the 4^(6e9) possible configurations of data.
- modzu 5 years ago
  
  so in the matrix we'll really be hard drives, not batteries
  
  dluan 5 years ago
  
  Kind of like bags of bitcoin really.
  
  jonplackett 5 years ago
  
  Favourite comment in a long time
- yread 5 years ago
  
  Moreover cell contains mostly other stuff than a nucleus so that's another factor of 10 or so. And human body has a lot of 'empty' space - lungs, stomach, veins, collagen, bones,....
- gpm 5 years ago
  
  Another minor correction: Each base pair is one of 4 things, so 2 bits of information. Each base is half of a base pair so can really only claim 1 bit of information.
  Of course even when you said base you did the math with # of base pairs, so it doesn't really matter.
  
  hn_throwaway_99 5 years ago
  
  No, that is not correct. While each base is half of a base pair, each pair is NOT reversible. At each position along one strand you have 4 options, so each base encodes 2 bits. Even though A always pairs with T and G with C, which strand each base is on matters, so in other words if you have pair "A-T" followed by "G-C" that is very different from "T-A" followed by "C-G".
  
  gpm 5 years ago
  
  You have a problem with your counting. Each location can be in one of four possible states:
  T-A
  A-T
  G-C
  C-G
  That location, using 2 base's, encodes 4 possible values, i.e. 2 bits. I.e. 1 bit per base.
fxfan 5 years ago

It's mentioned in the article before the ATGC paragraph
- itchyjunk 5 years ago
  
  ""Under the right conditions, DNA can last much longer than current archival storage technologies that degrade in a matter of decades. Some DNA has managed to persist in less than ideal storage conditions for tens of thousands of years in mammoth tusks and bones of early humans, and it should have relevancy as long as people are alive.""
  But that doesn't explain storage density. And biological DNA has mechanisms to survive in a cell but these synthetic DNA's might be standalone.
  
  tomatotomato37 5 years ago
  
  One other suspect thing about that is that the DNA in a body is essentially replicated across every single cell, which means maybe that the DNA from 10,000 yeras ago has just survived as a rule of statistics over any individual durability
  
  leggomylibro 5 years ago
  
  I wonder if you could make separate 'zip/unzip' stations to compress/decompress them with histone proteins, like the body does.
  https://en.wikipedia.org/wiki/Histone

mikerg87 5 years ago

>The team from the Molecular Information Systems Lab has already demonstrated that it can store cat photographs, great literary works

The meme is true

sidcool 5 years ago

What are the potential applications of this tech?

KallDrexx 5 years ago

Does anyone know how they ensure read order? Since it's a fluid that's moving around I'm having a hard time to grasp how they make sure they read the DNA in the same order it was written in.

mbreese 5 years ago

You wouldn’t. The read order would be encoded in the “file format”. Kind of like a Unicode byte order marker.
You’d have to deconvolute the signal first, then computationally determine the strand you actually read.
shpongled 5 years ago

DNA polymerase "reads" in one direction, so you prime the polymerase with a short DNA sequence (a primer), that is complementary and located 5' (before) the section you want to read
iso1337 5 years ago

Related is querying: You can synthesize another pair of Oligos to selectively amplify part of the DNA that you wrote and only sequence that. Look up PCR if you are interested.
adjkant 5 years ago

When reading this it made me wonder if DNA storage will borrow a lot of concepts from SIMD

bookofjoe 5 years ago

https://www.nature.com/articles/s41598-019-41228-8

https://www.nature.com/articles/s41598-019-41228-8.epdf?shar...

jasonhansel 5 years ago

Wouldn't any other copolymer work equally well for this purpose? (I know nothing about chemistry or biology, so this is probably a stupid question.)

mxwsn 5 years ago

In principle sure, as long as you use a different polymer with similar biochemical stability, but a lot of money and tech development has gone into reading and synthesizing DNA in particular. Why reinvent the wheel, so to speak.
- shpongled 5 years ago
  
  Not just tech... life has already provided us with optimized tools for reading, writing, translating, and correcting DNA. The chemical properties of DNA are well known, and it's an incredibly stable molecule.
  Doesnt really make sense to switch to a different copolymer.
- xvilka 5 years ago
  
  In fact it might be even better with DNA, since you can use much bigger "alphabet" for this purpose, than nature gave us. Would tremendously increase storage capacity.
  
  iso1337 5 years ago
  
  There are already xenobases being developed, eg Romesburg. Those add extra bases beyond the 4-5 seen in nature.
  
  xvilka 5 years ago
  
  I know and was referring to them.
  
  astazangasta 5 years ago
  
  ...at the cost of bit errors.
- jasonhansel 5 years ago
  
  Good point! Makes perfect sense.

amelius 5 years ago

I hope they write the data after performing encryption, so hackers can't find biological exploits should they exist.

social_quotient 5 years ago

Is it feasible (in the future) for the human body to host these DNA storage medium/devices?

iso1337 5 years ago

Maybe in the distant future.
There are really three steps: writing, storage, reading.
Writing: There are dreams of using biology to write DNA (currently we use chemical synthesis). See the San Diego startup Molecular Assemblies.
That could miniaturize a big part of it and not require your body to host bottles of nasty chemicals. Maybe you could find clever ways of siphoning the chemical building blocks and ATP energy from the host.
Storage and retrieval: maybe just have bioinert containers implanted. You’d like to be able to reuse them so ideally there would be a robust flushing mechanism.
Reading: the Minion reader from Oxford Nanopore that this paper uses is about the size of a usb stick. The technology inside is a hybrid of electronics and protein. This technology will undoubtedly get better and smaller as sequencing has a a lot of R and D money going into it.
shpongled 5 years ago

I wouldn't put exogenous nonsense DNA into my body. That's just asking for a bad time.
- gdy 5 years ago
  
  Computer viruses are about to converge with the biological ones.

vcdimension 5 years ago

This could bring a whole new meaning to the words "computer virus".

bsaul 5 years ago

There are so many science fiction novel to write using this as a starting point...

harmful_stereo 5 years ago

Yeah, like owing "blood rent" to the descendants of proprietary works that pass their copyrighted material to their offspring. Or land rights, or special social privilege. Or the answer keys for professional and entrance examinations. Or classified data. Suddenly you have to have your genome sequenced to walk through the airport and documented status is conferred by blood.
I'm not being a luddite here on purpose, but over long time scales there's a tremendous potential for this kind of technology to push towards a kind of class differentiated society in the way most of us would despise.
Some technologies are leveling, like roads or mass transit or vaccines or industrially produced consumables. I don't see public institutions putting libraries in the seeds of apple trees as a civilizational fail safe, whether that's centrally planned economies or democracies. But maybe you could get an ethnostate like Israel to include the talmud in your cells or your microbiome when you settle on occupied land. Best case scenario with body horror is that it becomes like tattoos. I await the forthcoming Atwood book with that slightly alarmist slant.
- pm90 5 years ago
  
  We were supposed to have annihilated all of human life in a nuclear firestorm. But somehow we did manage to survive. Sure, the Great Firewall and China's potent surveillance system does alarm me greatly. But as long as the people who create these technologies also work in a society which places responsible limitations on their use, we might just be able to get to being an interplanetary species at some point.
  Once humanity becomes capable of living and thriving in different planets/outer space without the mother planet is when things will start getting really interesting.
  
  hodgesrm 5 years ago
  
  I would not rule out nuclear catastrophe just yet. The technology has only been available for 75 years. So far most of the weapons have been in the hands of nations with stable command and control structures. That may not be the case in the next few decades.