Search
A rapid retrieval tool for operating on large, flat archive files.
Computer-aided sequencing and analysis facilities need to efficiently search flat archive files. Retrieval by e-mail or network server connections can become impractical in cases where large numbers of selected entries need to be accessed. Public versions of these archives can be retrieved via ftp and installed on a local hard disk as an alternative to network-based retrieval. After installation, a scheme is required for rapid access of the archive that is consistent with the other production...
Toward a cDNA map of the human genome.
Advances in the Human Genome Project are shaping the strategies for identifying the 50,000-100,000 human genes. High-resolution genetic maps of the human genome combined with sequencing herald an era of rapid regional definition of disease genes. However, only once their chromosome band location is known will the systematic partial sequencing of thousands of random cDNA clones provide the reagents for teh rapid assessment of the genes responsible for the inherited disorders. We now present an...
Identification of new Schistosoma mansoni genes by the EST strategy using a directional cDNA library.
A directional size-selected cDNA library constructed from Schistosoma mansoni (Sm) adult worm RNA was used for the generation of expressed sequence tags (EST). From one or both ends of 429 distinct cDNA clones 607 EST were obtained. Of these, only 16% were previously known Sm genes. More than 22% of the clones had matches with entries for other organisms in the databases. These new Sm genes constituted a broad range of transcripts distributed among cytoplasmic structural and regulatory...
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase...
Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.
In an effort to identify new genes and analyse their expression patterns, 174,472 partial complementary DNA sequences (expressed sequence tags (ESTs)), totalling more than 52 million nucleotides of human DNA sequence, have been generated from 300 cDNA libraries constructed from 37 distinct organs and tissues. These ESTs have been combined with an additional 118,406 ESTs from the database dbEST, for a total of 83 million nucleotides, and treated as a shotgun sequence assembly project. The...
A gene map of the human genome.
The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing...
The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same...
Methanococcus jannaschii genome: revisited.
Analysis of genomic sequences is necessarily an ongoing process. Initial gene assignments tend (wisely) to be on the conservative side (Venter, 1996). The analysis of the genome then grows in an iterative fashion as additional data and more sophisticated algorithms are brought to bear on the data. The present report is an emendation of the original gene list of Methanococcus jannaschii (Bult et al., 1996). By using a somewhat more updated database and more relaxed (and operator-intensive)...
Use of the complete genome sequence information of Haemophilus influenzae strain Rd to investigate lipopolysaccharide biosynthesis.
The availability of the complete 1.83-megabase-pair sequence of the Haemophilus influenzae strain Rd genome has facilitated significant progress in investigating the biology of H.influenzae lipopolysaccharide (LPS), a major virulence determinant of this human pathogen. By searching the H. influenzae genomic database, with sequences of known LPS biosynthetic genes from other organisms, we identified and then cloned 25 candidate LPS genes. Construction of mutant strains and characterization of...
Serial analysis of gene expression: ESTs get smaller.
Measuring gene expression on a global scale has been one of the vexing problems of cell biology. Velculescu et al.(1) recently proposed a system for identifying gene expression levels based on very short sequence tags-about nine base pairs-located at a specific site within a gene transcript. By coupling the strategy to current automated sequencing machines and the large expressed sequence tag databases, it should be possible to follow changes in gene expression for large numbers of genes...