Haft, D. H., Basu, M. K.
Biological Systems Discovery In Silico: Radical SAM Protein Families and Their Target Peptides for Post-translational Modification
J Bacteriol. 2011 Apr 08;
Data mining methods in bioinformatics and comparative genomics commonly rely on working definitions of protein families from prior computation. Partial Phylogenetic Profiling (PPP), by contrast, optimizes family sizes during its searches for the co-occuring protein families that serve different roles in the same biological system. In a large-scale investigation of the incredibly diverse radical SAM enzyme superfamily, PPP aided in building a collection of 68 TIGRFAMs hidden Markov models (HMMs) that define non-overlapping and functionally distinct subfamilies. Many identify radical SAM enzymes as molecular markers for multi-component biological systems; HMMs defining their partner proteins also were constructed. Newly found systems include five groupings of protein families in which at least one marker is a radical SAM enzyme while another, encoded by an adjacent gene, is a short peptide predicted to be its substrate for post-translational modification. The most prevalent, in over 125 genomes, featuring a peptide we designate SCIFF (Six Cysteines in Forty-Five residues), is conserved throughout the class Clostridia, a distribution inconsistent with putative bacteriocin activity. A second novel system features a tandem pair of putative peptide-modifying radical SAM enzymes associated with a highly divergent family of peptides in which the only clearly conserved feature is a run of His-Xaa-Ser repeats. A third system pairs a radical SAM domain peptide maturase with selenocysteine-containing targets, suggesting a new biological role for selenium. These and several additional novel maturases that co-occur with predicted target peptides share a C-terminal additional 4Fe4S-binding domain with PqqE, the subtilosin A maturase AlbA, and the predicted mycofactocin and nif11-class peptide maturases as well as with activators of anaerobic sulfatases and quinohemoprotein amine dehydrogenases. Radical SAM enzymes with this additional domain, as detected by TIGR04085, significantly outnumber lantibiotic synthases and cyclodehydratases combined in reference genomes while being highly enriched for members whose apparent targets are small peptides. Interpretation of comparative genomics evidence suggests unexpected (non-bacteriocin) roles for natural products from several of these systems.