METAREP provides a suite of web based tools to help scientists to view, query, browse and compare metagenomic annotation data derived from ORFs called on metagenomics reads or assemblies.

METAREP supports browsing of functional and taxonomic assignments. Users can either specify fields, or logical combinations of fields to filter and refine datasets. Users can compare multiple datasets at various functional and taxonomic levels applying statistical tests as well as hierarchical clustering, multidimensional scaling and heatmaps.

For each of these features users can export tab delimited files for downstream analysis. The web site is optimized to be user friendly and fast.


  • Handle extremely large datasets. Uses scalable high-performance Solr/Lucene search engine (we have indexed 300 million annotation entries, but much larger volumes can be handled as shown by Hathi Trust).
  • Compare 20+ datasets at the same time. Use various compare options including statistical tests and plot options to visualize dataset difference at various taxonomic and functional levels.
  • Apply statistical tests such as METASTATS (White et al.), a modified non-parametric t-test to compare two sample populations (e.g. metagenomics samples from healthy and diseased individuals).
  • Export publication-ready graphics. Export heatmaps, hierarchical clustering, and multi-dimensional scaling plots in PDF format.
  • Analyze KEGG metabolic pathways. Summaries include enzyme highlights on KEGG maps, pathway enzyme distributions, and statistics about pathway coverage at various pathway levels.
  • Search using a SQL-like query syntax. Build your query using 14 different fields that can be combined logically.
  • Drill down into data using METAREP’s NCBI Taxonomy, Gene Ontology, Enzyme Classification or KEGG Pathway browser. Install your own METAREP version.
  • Flexible central configuration, METAREP and 3rd party code base is completely open source.
  • Cross-link function with phylogeny. Slice your data at various taxonomic and/or functional levels. For example, search for all bacteria or exclude eukaryotes or search for a certain (GO/EC ID)/taxonomic combination.
  • Generic data format. Data types that can be populated include a free text functional description, best BLAST hit information, as well as GO ID, EC ID, and HMMs.

Principal Investigator



Related Research