- Wheat Home
- Genome Facts
- Annotation
- Annotation Methods
- DFCI Gene Index
- Rice-Wheat Syntenic Mapping
- Alignment Browse
- Synteny Downloads
- Figure View
- Rice-Wheat Gene Mapping
- Affymetrix-Wheat Gene Mapping
- Wheat TAs
- Triticum aestivum
- Triticum monococcum
- Triticum turgidum
- Search Functions
- Blast
- Clone Search
- Feature Search
- Gene Name Search
- USDA Chr 3AS
- Outreach Workshop
- Downloads
- Links
Annotation Methods
Annotation of Wheat Genomic Sequences
Selected BACs proceed through sequencing and closure, and are then ready for annotation. We believe annotation of individual genes is an essential part of the genome project. This permits gene discovery in a systematic, comprehensive and consistent manner. Gene finding and repeat annotation will be done in parallel to maximize identification of true genes and minimize mis-annotation of transposable elements as genes.
Steps Involved in Annotation
The BAC sequence and results of all analyses are stored in our central relational database (Sybase).
Orientation
All sequences are oriented from the SP6 (base 1) to the T7 end of the vector if the orientiation is provided in the Genbank record.
Repetitive sequences
Wheat repetitive elements were identified and masked by RepeatMasker using several libraries, including RepBase, TREP (the Triticeae Repeat Sequence Database), and TIGR Oryza Repeat Database
Gene prediction programs
- FGENESH (monocot)
- Genscan (Maize)
- Genscan+ (Arabidopsis)
- GlimmerHMM (rice)
- tRNAscan-SE, to predict tRNA
Loci and gene model nomenclature
The genes, which are also known as loci or transcriptional units (TU), have been annotated using the BAC name and a gene number that is oriented relative to the sequence. For example, BAC clone 27H32, the first gene located at base 10 to 1247 will be 27H32.t00001, the second gene located at base 1568 to 2700 will be 27H32.t00002, etc. Models should be named with a "m" to distinguish models from TUs/loci. To provide a stable identifier for future updates of the annotation, a reduced gene/locus/TU can be used (27H32.1, 27H32.2, etc).
Example:
Stable Identifier: 27H32.1
Locus or TU: 27H32.t00001
Gene model: 27H32.m00001
Functional assignment
Putative function for the genes has been assigned via combination of BLASTP matches to a non-redundant amino acid database and Pfam trusted cutoff scores as well as searches of transcript evidence (ESTs and full length cDNAs). A table summarizing the putative function assignment guidelines is provided below.
Putative Function |
Match in Non-redundant amino acid (nraa) db |
Pfam database Trusted Cutoff Score |
Wheat ESTs/FL-cDNA alignment |
Sample of annotation |
Known |
>90-100% ID, >90-100% length |
May be above trusted cutoff, not essential |
Optional for annotation |
Aquaporin |
Putative |
>45% ID, >50% length |
May be above trusted cutoff, not essential |
Optional for annotation |
chitinase, putative |
XX-domain containing protein |
N/A |
Above trusted cutoff |
Optional for annotation |
WD-domain containing protein |
Expressed |
No similarity detected in nraa, or similarity to protein in nraa is < 45% ID and/or <50% coverage, or similarity is to 1) an expressed protein, or 2) a protein with no known |
Below trusted cutoff |
>95% ID, >70% length of EST | Expressed protein |
Conserved Hypothetical Protein |
>45% ID, >50% length to a protein annotated as hypothetical protein |
Below trusted cutoff |
<95% ID, <70% length of EST |
Conserved hypothetical protein |
Hypothetical Protein |
No match to any db entry >45% ID, >50% length |
Below trusted cutoff |
<95% ID, <70% length of EST |
Hypothetical protein |
Pseudogenes
Pseudogenes were defined based on evidence of transcription yet have no clear ORF.The sequences of the annotated genes, along with supporting evidence, can also be found on our web site.
Software Links
- FGENESH
- Genscan (Chris Burge, Massachusetts Institute of Technology)
- Genscan+ (Chris Burge, Massachusetts Institute of Technology)
- GlimmerHMM (Salzberg, Pertea, at al., The Institute for Genomic Research)
- tRNAscan-SE (Sean Eddy, Dept. of Genetics, Washington U. School of Medicine)
- dds/gap2, dps/nap (Xiaoqiu Huang, Dept of Computer Science, Michigan Technological University)
- RepeatMasker2 (A.F.A. Smit & P. Green, University of Washington)