Carlton, J. M., Muller, R., Yowell, C. A., Fluegge, M. R., Sturrock, K. A., Pritt, J. R., Vargas-Serrato, E., Galinski, M. R., Barnwell, J. W., Mulder, N., Kanapin, A., Cawley, S. E., Hide, W. A., Dame, J. B.
Profiling the Malaria Genome: a Gene Survey of Three Species of Malaria Parasite With Comparison to Other Apicomplexan Species
Mol Biochem Parasitol. 2001 Dec 01; 118(2): 201-10.
We have undertaken the first comparative pilot gene discovery analysis of approximately 25000 random genomic and expressed sequence tags (ESTs) from three species of Plasmodium, the infectious agent that causes malaria. A total of 5482 genome survey sequences (GSSs) and 5582 ESTs were generated from mung bean nuclease (MBN) and cDNA libraries, respectively, of the ANKA line of the rodent malaria parasite Plasmodium berghei, and 10874 GSSs generated from MBN libraries of the Salvador I and Belem lines of Plasmodium vivax, the most geographically wide-spread human malaria pathogen. These tags, together with 2438 Plasmodium falciparum sequences present in GenBank, were used to perform first-pass assembly and transcript reconstruction, and non-redundant consensus sequence datasets created. The datasets were compared against public protein databases and more than 1000 putative new Plasmodium proteins identified based on sequence similarity. Homologs of previously characterized Plasmodium genes were also identified, increasing the number of P. vivax and P. berghei sequences in public databases at least 10-fold. Comparative studies with other species of Apicomplexa identified interesting homologs of possible therapeutic or diagnostic value. A gene prediction program, Phat, was used to predict probable open reading frames for proteins in all three datasets. Predicted and non-redundant BLAST-matched proteins were submitted to InterPro, an integrated database of protein domains, signatures and families, for functional classification. Thus a partial predicted proteome was created for each species. This first comparative analysis of Plasmodium protein coding sequences represents a valuable resource for further studies on the biology of this important pathogen.