Deep sequencing

Depth (coverage) in DNA sequencing refers to the number of times a nucleotide is read during the sequencing process. Deep sequencing indicates that the total number of reads is many times larger than the length of the sequence under study. Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence.

Depth can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as N\times L/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (sometimes also called coverage). A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of DNA sequencing theory addresses the relationships of such quantities.

Sometimes a distinction is made between sequence coverage and physical coverage. Sequence coverage is the average number of times a base is read (as described above). Physical coverage is the average number of times a base is read or spanned by mate paired reads.[1]

The term "deep" has been used for a wide range of depths (>7×), and the newer term "ultra-deep" has appeared in the scientific literature to refer to even higher coverage (>100×).[2][3]

Even though the sequencing accuracy for each individual nucleotide is very high, the very large number of nucleotides in the genome means that if an individual genome is only sequenced once, there will be a significant number of sequencing errors. Furthermore, rare single-nucleotide polymorphisms (SNPs) are common. Hence to distinguish between sequencing errors and true SNPs, it is necessary to increase the sequencing accuracy even further by sequencing individual genomes a large number of times.

Deep sequencing of transcriptome or RNA

Deep sequencing of transcriptome, also known as RNA-Seq, provides both the sequence and frequency of RNA molecules that are present at any particular time in a specific cell type, tissue or organ. Counting the number of mRNAs that are encoded by individual genes provides an indicator of protein-coding potential, a major contributor to phenotype.[4] Improving methods for RNA sequencing is an active area of research both in terms of experimental [5] and computational methods.

References

  1. Meyerson, M.; Gabriel, S.; Getz, G. (2010). "Advances in understanding cancer genomes through second-generation sequencing". Nature Reviews Genetics 11 (10): 685–696. doi:10.1038/nrg2841. PMID 20847746.
  2. Ajay SS, Parker SC, Abaan HO, Fajardo KV, Margulies EH (September 2011). "Accurate and comprehensive sequencing of personal genomes". Genome Res. 21 (9): 1498–505. doi:10.1101/gr.123638.111. PMC 3166834. PMID 21771779.
  3. Mirebrahim, Hamid; Close, Timothy J.; Lonardi, Stefano (2015-06-15). "De novo meta-assembly of ultra-deep sequencing data". Bioinformatics 31 (12): i9–i16. doi:10.1093/bioinformatics/btv226. ISSN 1367-4803. PMID 26072514.
  4. Hampton M, Melvin RG, Kendall AH, Kirkpatrick BR, Peterson N, Andrews MT (2011). "Deep sequencing the transcriptome reveals seasonal adaptive mechanisms in a hibernating mammal". PLOS ONE 6 (10): e27021. doi:10.1371/journal.pone.0027021. PMC 3203946. PMID 22046435.
  5. Heyer EE, Ozadam H, Ricci EP, Cenik C, Moore MJ (2015). "An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments.". Nucleic Acids Res. 43 (1): e2. doi:10.1093/nar/gku1235. PMID 25505164.
This article is issued from Wikipedia - version of the Sunday, January 24, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.