Search
An optimized protocol for analysis of EST sequences.
The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of genomic sequences. We have developed a rigorous protocol for reconstructing the sequences of transcribed genes from EST and gene sequence fragments. A key element in developing this protocol has been the evaluation of a number of sequence assembly programs to determine which most faithfully reproduce...
Gene index analysis of the human genome estimates approximately 120,000 genes.
Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000-100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140, 000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome,...
Cloning and characterization of HARP/SMARCAL1: a prokaryotic HepA-related SNF2 helicase protein from human and mouse.
The SNF2 gene family consists of a large group of proteins involved in transcriptional regulation, maintenance of chromosome integrity, and various aspects of DNA repair. We cloned a novel SNF2 family human cDNA, with sequence identity to the Escherichia coli RNA polymerase-binding protein HepA and named the human hepA-related protein (HHARP/SMARCAL1). In addition, the mouse ortholog (Mharp/Smarcal1) was cloned, and the Caenorhabditis elegans ortholog (CEHARP) was identified in the GenBank...
A comprehensive BAC resource.
The Human Genome Project has generated extensive map and sequence data for a large number of Bacterial Artificial Chromosome (BAC) clones. In order to maximize the efficient use of the data and to minimize the redundant work for the research community, The Institute for Genomic Research (TIGR) comprehensive BAC resource (cBACr) (http://www.tigr.org/tdb/BacResource/BAC_resourc e_intro. html) was built as an expansion of the TIGR human BAC ends database. This resource collects, integrates and...
Sequence evaluation of four pooled-tissue normalized bovine cDNA libraries and construction of a gene index for cattle.
An essential component of functional genomics studies is the sequence of DNA expressed in tissues of interest. To provide a resource of bovine-specific expressed sequence data and facilitate this powerful approach in cattle research, four normalized cDNA libraries were produced and arrayed for high-throughput sequencing. The libraries were made with RNA pooled from multiple tissues to increase efficiency of normalization and maximize the number of independent genes for which sequence data were...
The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species.
While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and...
Exploring the transcriptome of the malaria sporozoite stage.
Most studies of gene expression in Plasmodium have been concerned with asexual and/or sexual erythrocytic stages. Identification and cloning of genes expressed in the preerythrocytic stages lag far behind. We have constructed a high quality cDNA library of the Plasmodium sporozoite stage by using the rodent malaria parasite P. yoelii, an important model for malaria vaccine development. The technical obstacles associated with limited amounts of RNA material were overcome by PCR-amplifying the...
The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant.
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The...
High-resolution BAC-based map of the central portion of mouse chromosome 5.
The current strategy for sequencing the mouse genome involves the combination of a whole-genome shotgun approach with clone-based sequencing. High-resolution physical maps will provide a foundation for assembling contiguous segments of sequence. We have established a bacterial artificial chromosome (BAC)-based map of a 5-Mb region on mouse Chromosome 5, encompassing three gene families: receptor tyrosine kinases (PdgfraKit-Kdr), nonreceptor protein-tyrosine type kinases (Tec-Txk), and type-A...
Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species.
We have undertaken the first comparative pilot gene discovery analysis of approximately 25,000 random genomic and expressed sequence tags (ESTs) from three species of Plasmodium, the infectious agent that causes malaria. A total of 5482 genome survey sequences (GSSs) and 5582 ESTs were generated from mung bean nuclease (MBN) and cDNA libraries, respectively, of the ANKA line of the rodent malaria parasite Plasmodium berghei, and 10,874 GSSs generated from MBN libraries of the Salvador I and...