KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Computer Engineering & Information Technology 2. Theses (Ph.D)

Research on Constrained and Error Correction Codes for DNA Storage

Metadata Downloads

Abstract: Due to the increasing demand for data storage, DNA storage systems have begun to attract considerable attention as next-generation storage technologies due to their high densities and longevity. DNA storage technology is a method of storing binary information in the form of DNA strands, which are composed up of DNA sequences and primers. However, common obstacles to DNA storage are caused by insertion, deletion, and substitution errors occurring in DNA synthesis and sequencing. Therefore, reducing the error rates and correcting errors during DNA synthesis and sequencing is inevitable to guarantee reliable data storage in DNA storage. When the DNA strands stored in the DNA pool, efficient random-access desired information from stored DNA strands presents an additional obstacle in DNA storage.

To reduce error rates, a common approach involves imposing constraints on the stored DNA strands, such as ensuring they satisfy GC-balanced and homopolymer run-length constraints, etc. In terms of error correction in DNA storage, error correction codes are employed to enhance the reliability of the DNA synthesis and sequencing processes. Additionally, primers in DNA strands solve the problem of random-access in DNA storage.

This thesis propose a novel code construction method based on the weight distribution of the data and introduce a specific encoding process for both balanced and imbalanced data parts, which enables us to efficiently construct GC-balanced DNA codes. Additionally, to minimize errors in DNA storage processes, we propose a new single insertion/deletion nonbinary systematic error correction code with the maximum run-length constraint and its corresponding encoding algorithm. Finally, to solve the issue that efficient primer design for random-access in synthesized DNA strands, we propose a code design by combining weakly mutually uncorrelated codes with the maximum run length constraint for primer design. Moreover, we also explore the weakly mutually uncorrelated codes to satisfy combinations of maximum run length constraint with more constraints such as being almost-balanced and having large Hamming distance, which are also efficient constraints for random-access in DNA storage systems.