Recently, Shen Yue’s team and collaborators published a cover article in Natural Computing Sciences, which provides a high-density, high-stability bit-base codec method for the application of DNA information storage, and completes the experimental verification of information storage in vivo and foreign modes.
DNA storage, as one of the main directions for the integration of biotechnology and information technology, has received widespread attention in recent years. The codec of DNA storage, that is, the conversion between bits and bases, is one of the most important links in DNA storage, which not only determines the efficiency of information conversion (information density), but also directly affects the stability and reliability of stored information. Since 2012, the development of codec technology has mainly focused on improving information density, and the consideration of technical compatibility and stable recovery of original information is not yet comprehensive. Before 2017, codec technology failed to achieve full technical compatibility, and the GC content of the generated sequence still depended heavily on the 0/1 distribution of the original data. In 2017, the DNA fountain code developed by the research team at Columbia University in the United States almost solved this problem, but the channel coding technology directly applied has a strong data type preference, so there is a higher risk of data recovery in actual storage applications.
Shen Yue provided a picture from the research team in the laboratory
Shen Yue, the corresponding author of the paper, told China Science Daily that in order to solve this problem, they will be inspired by the DNA double-strand model, combined with the idea of “yin and yang” opposition and unity in Chinese culture, skillfully applied to the DNA codec system, with two different sets of rules, respectively, the two binary information “one-to-one” compilation and conversion, and then take the part of the unified intersection of the two as the final solution, to achieve the unification of the two independent information combinations into a string of DNA sequences; on the other hand, by introducing a screening mechanism, They filter sequences that are not well compatible with existing synthetic sequencing techniques through pre-set filters. According to different combination methods, the system can provide a total of 1536 different combinations of coding rules, greatly expanding the scope of its application scenarios.
Through theoretical derivation of coding and simulated coding of files of different data types, the researchers also proved that the system has achieved significant performance improvements in data recovery stability under the premise of ensuring information density (the average recovery rate of stored data is nearly two orders of magnitude higher than the existing level of DNA fountain codes).
Flat in the lab for the research team for the picture
The co-first author of the paper, Ping Qian, an assistant researcher at the Shenzhen Huada Life Science Research Institute, told reporters that in recent years, the use of cells for DNA information storage has also received great attention, for this reason, they also tested the system in yeast cells stored, after the data after passage to restore stability. The results showed that the yeast strain as a vector can still be fully recovered after more than 1000 generations of passage, which is close to the theoretical limit of the physical information density of natural DNA molecules, and the amount of information that can be stored per gram of DNA is about 432.2 EXB.
The research develops a new DNA storage coding method and proposes a scheme of 1536 different coding rules combinations, which provides an important tool for the multi-type application of DNA storage, and is expected to play a positive role in the research of new media for long-term storage of massive data. (Source: China Science Daily Tian Ruiying)
Related paper information: