Prediction of transcription terminators in bacterial genomes
Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL
This study describes an algorithm that finds rho-independent transcription terminators in bacterial genomes and evaluates the accuracy of its predictions. The algorithm identifies terminators by searching for a common mRNA motif: a hairpin structure followed by a short uracil-rich region. For each terminator, an energy-scoring function that reflects hairpin stability, and a tail-scoring function based on the number of U nucleotides and their proximity to the stem, are computed. A confidence value can be assigned to each terminator by analyzing candidate terminators found both within and between genes, and taking into account the energy and tail scores. The confidence is an empirical estimate of the probability that the sequence is a true terminator. The algorithm was used to conduct a comprehensive analysis of 12 bacterial genomes to identify likely candidates for rho-independent transcription terminators. Four of these genomes (Deinococcus radiodurans, Escherichia coli, Haemophilus influenzae and Vibrio cholerae) were found to have large numbers of rho-independent terminators. Among the other genomes, most appear to have no transcription terminators of this type, with the exception of Thermotoga maritima. A set of 131 experimentally determined E. coli terminators was used to evaluate the sensitivity of the method, which ranges from 89 % to 98 %, with corresponding false positive rates of 2 % and 18 %.