Guharay, S., Hunt, B. R., Yorke, J. A., White, O. R.
Correlations In DNA Sequences Across the Three Domains of Life
Physica D-Nonlinear Phenomena. 2000 Nov 15; 146(1): 388-396.
We report statistical studies of correlation properties of similar to 7500 gene sequences, covering coding (exon) and non-coding (intron) sequences for DNA and primary amino acid sequences for proteins, across all three domains of life, namely Eukaryotes (cells with nuclei), Prokaryotes (bacteria) and Archaea (archaebacteria). Mutual information function, power spectrum and Holder exponent analyses show exons with somewhat greater correlation content than the introns studied. These results are further confirmed with hypothesis testing. While similar to 30% of the Eukaryote coding sequences show distinct correlations above noise threshold, this is true for only similar to 10% of the Prokaryote and Archaea coding sequences, for protein sequences, we observe correlation lengths similar to that of "random" sequences. (C) 2000 Elsevier Science B.V. All rights reserved.