
CABOG
Overview
CABOG (Celera Assembler with Best Overlap Graph) is scientific software for DNA research. CABOG has been a critical component of many genome sequencing projects. CABOG operates on small genomes such as bacterial as well as large genomes such as mammalian. CABOG is an extension of the Celera Assembler software that was originally developed at Celera for the 2001 publication of the first draft human genome sequence. The software was released to the public domain in 2004. Its open source repository on Source Forge is an internet resource for scientists around the world.
CABOG is one of many software programs called genome assemblers. These programs exist to overcome the fundamental limitation of all sequencing machines, namely, that they read out very few DNA letters at a time. These programs reconstruct genomes that are billions of letters long from the hundreds of letters per read that modern sequencers provide. What these programs do is often described as a scaled up version of a family solving a jigsaw puzzle.
The CABOG software was the first to accomplish many scientific goals. It was the first to assemble the genome of a multicellular organism (Drosophila melanogaster, 2000). It was the first to assemble both parental haplotypes of one human genome (J. Craig Venter, 2007). It was the first to assemble environmental sequence from the oceans (Sargasso Sea in 2004 and Global Ocean Sampling in 2007). It was first to combine reads from first-generation Sanger sequencing machines and second-generation pyrosequencing machines (Marine microbes, 2006). Today, CABOG is one of the leading assembly programs for data sets that include paired end data from the Roche 454 line of sequencing machines.
Algorithms Research & Software Development
The assembly team at JCVI is actively involved in research to expand the realm of what can be learned with modern DNA sequencing technology. The research is focused on extending and improving the CABOG software. The research led to recent publications of methods for combining Sanger and 454 data from the same genome (see 2006 and 2008 publications) and a method for combining improving a draft assembly with the automatic integration of finishing reads (see 2010 publication). Current research is focused on the integration of data from other sequencing technologies such as Illumina and Pacific Biosciences sequencers. CABOG is well suited for integration of all these technologies when they are applied to large and complex genomes such as those of mammals.
Recent Genome Projects
The JCVI assembly team works with collaborators around the world on important genomes such as:
- Tasmanian devil (Sarcophilus harrisii). Collaborators: Stephan Schuster and Webb Miller at Penn State University, USA working with Vanessa Hayes at JCVI. Publication in press.
- The great ape bonobo (Pan paniscus). Collaborator: Svante PÃÆ'Ã"šÂ¤ÃƒÆ’Æ'Ã"šÂ¤bo, Max Planck Institute, Germany. Publication in preparation.
- The malaria mosquito (Anopheles gambiae). Collaborators: Ewen Kirkness at JCVI and many others. Widespread Divergence Between Incipient Anopheles gambiae Species Revealed by Whole Genome Sequences (2010) Science.
- The human body louse (Pediculus humanus). Collaborators: Ewen Kirkness at JCVI and many others. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle (2010) PNAS.
- A freshwater cnidarian (Hydra). Collaborators: Ewen Kirkness at JCVI and many others. The dynamic genome of Hydra (2010) Nature.
- The causative agent of amoebic dysentery (Entamoeba histolytica). Collaborators: The NIH Genome Sequencing Centers and Lis Caler at JCVI. New assembly, reannotation and analysis of the Entamoeba histolytica genome reveal new genomic features and protein content information (2010) PLoS Neglected Tropical Diseases.
- Cucumber (Cucumis sativus). Collaborator: 454 Life Sciences.
- The cotton bollworm moth (Helicoverpa armigera). Collaborator: Karl Gordon, CSIRO, Australia. Work in progress.
- Brussels sprouts (Brassica oleracea). Collaborator: The Multinational Brassica Genome Project, including Chris Town at JCVI. Work in progress.
- Bacterial isolates from healthy human body sites. Collaborators: The NIH Human Microbiome Project and Karen Nelson at JCVI. A Catalog of Reference Genomes from the Human Microbiome (2011) Science.
- Chicken (Gallus gallus). Collaborator: Wes Warren, Washington University at St. Louis. A vertebrate case study of the quality of assemblies (2011) Genome Biology.
Funding
The National Institute of General Medical Sciences (NIGMS), part of the USA National Institutes of Health (NIH), provides critical funding for this research.