DNA Patterns
Part of a series on |
Genetics |
---|
Key components |
History and topics |
Research |
|
Personalized medicine |
Personalized medicine |
|
DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promoters and genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns.[1][2]
Method
The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3] They adapted algorithms from cryptography and optical character recognition to make their graphs. To graph a DNA pattern, two values, kappa index of coincidence and the total percentage of cytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over the DNA sequence. The kappa index of coincidence measures the degree of organization or randomness of a sequence.
The analysis of such two-dimensional patterns can be performed by considering their shape and density (using optical character recognition algorithms) and the trend-line of the points. Inside a pattern, long homopolymeric tracts will be plotted in the upper part of the pattern (relative to the nucleotide frequency of the entire sequence) and tandem short tracts will be plotted in the middle of the pattern. As the homopolymeric tracts become shorter and shorter (up to di- or tri- nucleotide formations), the kappa value decreases and the point on the pattern will be placed also in the middle, but lower on the Y-axis. All the values generated by the same repetitive sequences will be positioned in exactly the same point on the pattern (total points inside the pattern = promoter length - sliding window length).
Current uses and discoveries
By using the DNA pattern method, two major observations have been made regarding gene promoters:
1) The total number of types of gene promoters, namely 10 possible classes of gene promoters in eukaryote.[3]
2) Structural properties of gene promoters show more than two phenotypes of diabetes.[5] Here, the distribution of DNA patterns shows a clustering of gene promoters by phenotype (Figure 2).
Example
Human INS (insulin) gene promoter ranging from -499b to 100b, relative to the TSS (transcription start site).
>gi|224514737|ref|NT_009237.18|:c2122939-2121009 H.sapiens INS gene region, 500 bases upstream of TSS:
GGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGG GACAGGGGTCCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGG GTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGCAGCGCAAAG AGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTGGAAAGTGGCC CAGGTGAGGGCTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAG GGACAGGTCTGGCCACCGGGCCCCTGGTTAAGACTCTAATGACCCGCTGG TCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGC ACCAGGGAAATGGTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCG ACCCCCCCACCCCAGGCCCTAATGGGCCAGGCGGCAGGGGTTGAGAGGTA GGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCCAGCAGCCCTC
Gagniuc Java Script implementation
The implementation below produces the DNA pattern of the insulin gene promoter (Figure 1):
<canvas id="canvas" height="333" width="309"></canvas>
<script>
function Pattern(sequence) {
var ctx = document.getElementById('canvas').getContext('2d');
var img = new Image();
img.src = 'pattern.png';
img.onload = function(){
var window = 30;
var step = 1;
var sliding_windw;
var xx;
var yy;
var xxq;
var yyq;
xxq = document.getElementById('canvas').width;
yyq = document.getElementById('canvas').height;
ctx.drawImage(img,0,0);
for (var u=0; u<=sequence.length - window; u += step)
{
sliding_windw = sequence.substr(u,window);
xx = (xxq/100) * CG(sliding_windw);
yy = yyq - ((yyq/100) * IC(sliding_windw));
ctx.beginPath();
ctx.moveTo(xx,yy);
ctx.lineTo(xx+4,yy);
ctx.stroke();
}
}
}
Pattern("GGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTGTGGGGATAGG
GTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGCAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTG
GTCTAATGTGGAAAGGGCCCAGGTGAGGGCTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCT
GGTTAAGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGCACCAGGGAAATGGTCCGGA
AATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCTAATGGGCCAGGCGGCAGGGGTTGAGAGGTAGGGGAGATGGGCT
CTGAGACTATAAAGCCAGCGGGGGCCCAGCAGCCCTC");
function CG (sequence)
{
sequence = sequence.toLowerCase();
var a = 0;
var t = 0;
var c = 0;
var g = 0;
for (var u=0; u<=sequence.length; u ++)
{
var nucleo = sequence.substr(u,1);
if (nucleo == "a") {a = a + 1;}
if (nucleo == "t") {t = t + 1;}
if (nucleo == "g") {g = g + 1;}
if (nucleo == "c") {c = c + 1;}
}
return ((100 / (c + g + t + a)) * (c + g)).toFixed(2);
}
function IC (sequence)
{
var S1 = sequence.toLowerCase();
var max = S1.length - 1;
var total = 0;
var count = 0;
for (var u=1; u<=max; u++)
{
var s2 = S1.substr(u + 1,S1.length);
for (var i=1; i<=s2.length; i++)
{
if (S1.substr(i,1) == s2.substr(i,1)) {
count = count + 1;
}
}
if (count <= 0) {
count = 0;
} else {
total = total + (count / s2.length * 100);
count = 0;
}
}
return (total / max).toFixed(2);
}
function Sliding_Windw(sequence)
{
var window = 30;
var step = 1;
for (var u=0; u<=sequence.length - window; u += step)
{
document.writeln(sequence.substr(u,window) + "<br>");
}
}
</script>
Note: A blank 333 x 309 PNG image must be present in the same folder as the script/html file.
External links
- DNA patterns & application
- DNA patterns and their analysis (22,000 gene promoters)
- PromKappa 2.0 (DNA patterns in Java)
- PromKappa V3.0 Java (uses the DNA pattern method)
- PromKappa software for DNA pattern generation and analysis
- Decoding gene promoters via DNA patterns
- XBIOINFORMATICS blog
References
- ↑ Gagniuc, P; C.I. Tirgoviste (2012). "An evolutionary perspective on adiponectin and insulin gene promoters". Adipobiology 4: 111–115.
- ↑ Gagniuc, Paul; et al. "DNA Patterns and Evolutionary Signatures Obtained Through KAPPA Index of Coincidence" (PDF). Revue roumaine des sciences techniques Série Électrotechnique et Énergétique, vol. 57, pp. 100-109, 2012. Retrieved 1 February 2013.
- 1 2 Gagniuc, Paul; Ionescu-Tirgoviste, Constantin (1 January 2012). "Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters" (PDF). BMC Genomics 13: 512. doi:10.1186/1471-2164-13-512. PMC 3549790. PMID 23020586.
- 1 2 Gagniuc, Paul; Ionescu-Tirgoviste, Constantin (2012-09-28). "Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters". BMC Genomics 13 (1): 512. doi:10.1186/1471-2164-13-512. ISSN 1471-2164. PMC 3549790. PMID 23020586.
- 1 2 Ionescu-Tîrgovişte, Constantin; Gagniuc, Paul Aurelian; Guja, Cristian. "Structural Properties of Gene Promoters Highlight More than Two Phenotypes of Diabetes". PLOS ONE 10 (9). doi:10.1371/journal.pone.0137950. PMC 4574929. PMID 26379145.