Search
Plant database resources at The Institute for Genomic Research.
With the completion of the genome sequences of the model plants Arabidopsis and rice, and the continuing sequencing efforts of other economically important crop plants, an unprecedented amount of genome sequence data is now available for large-scale genomics studies and analyses, such as the identification and discovery of novel genes, comparative genomics, and functional genomics. Efficient utilization of these large data sets is critically dependent on the ease of access and organization of...
Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues.
A database containing mapped partial cDNA sequences from Caenorhabditis elegans will provide a ready starting point for identifying nematode homologues of important human genes and determining their functions in C. elegans. A total of 720 expressed sequence tags (ESTs) have been generated from 585 clones randomly selected from a mixed-stage C. elegans cDNA library. Comparison of these ESTs with sequence databases identified 422 new C. elegans genes, of which 317 are not similar to any sequences...
A quality control algorithm for DNA sequencing projects.
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known...
Chromosomal distribution of 320 genes from a brain cDNA library.
We have determined the chromosomal assignment of 320 brain expressed genes by studying the segregation of polymerase chain reaction (PCR) products in human rodent somatic cell hybrids and by genetically mapping polymorphic cDNAs using the CEPH (Centre d'Etude du Polymophisme Humaine) reference pedigrees and database. These mapped genes can function as markers on the physical map of the human genome, as well as serve as candidate disease gene loci. Distribution of these genes to the human...
Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library.
A human infant brain cDNA library, made specifically for production of expressed sequence tags (ESTs) was evaluated by partial sequencing of over 1,600 clones. Advantages of this library, constructed for EST sequencing, include the use of directional cloning, size selection, very low numbers of mitochondrial and ribosomal transcripts, short polyA tails, few non-recombinants and a broad representation of transcripts. 37% of the clones were identified, based on matches to over 320 different genes...
3,400 new expressed sequence tags identify diversity of transcripts in human brain.
We present the results of the partial sequencing of over 3,400 expressed sequence tags (ESTs) from human brain cDNA clones, which increases the number of distinct genes expressed in the brain, that are represented by ESTs, to about 6,000. By choosing clones in an unbiased manner, it is possible to construct a profile of the transcriptional activity of the brain at different stages. Proteins that comprise the cytoskeleton are the most abundant; however, a large variety of regulatory proteins are...
Mutation of a mutL homolog in hereditary colon cancer.
Some cases of hereditary nonpolyposis colorectal cancer (HNPCC) are due to alterations in a mutS-related mismatch repair gene. A search of a large database of expressed sequence tags derived from random complementary DNA clones revealed three additional human mismatch repair genes, all related to the bacterial mutL gene. One of these genes (hMLH1) resides on chromosome 3p21, within 1 centimorgan of markers previously linked to cancer susceptibility in HNPCC kindreds. Mutations of hMLH1 that...
Analysis of the complete genome of smallpox variola major virus strain Bangladesh-1975.
We analyzed the 186,102 base pairs (bp) that constitute the entire DNA genome of a highly virulent variola virus isolated from Bangladesh in 1975. The linear, double-stranded molecule has relatively small (725 bp) inverted terminal repeat (ITR) sequences containing three 69-bp direct repeat elements, a 54-bp partial repeat element, and a 105-base telomeric end-loop that can be maximally base-paired to contain 17 mismatches. Proximal to the right-end ITR sequences are another seven 69-bp...
Analysis of gene expression by tissue and developmental stage.
High-throughput sequencing of cDNAs from multiple tissue- and stage-specific libraries is an efficient method for characterizing gene expression by tissue and developmental stage. When combined with functional information derived from the systematic study of transcription factors, signal transducers, and other regulatory molecules in model systems, data from expressed sequence tag projects provide an increasingly detailed picture of gene expression and its regulation. Understanding this picture...