JCVI: Research / Projects / CABOG / Overview
Section Banner



CABOG (Celera Assembler with Best Overlap Graph) is scientific software for DNA research. CABOG has been a critical component of many genome sequencing projects. CABOG operates on small genomes such as bacterial as well as large genomes such as mammalian. CABOG is an extension of the Celera Assembler software that was originally developed at Celera for the 2001 publication of the first draft human genome sequence. The software was released to the public domain in 2004. Its open source repository on Source Forge is an internet resource for scientists around the world. 

CABOG is one of many software programs called genome assemblers. These programs exist to overcome the fundamental limitation of all sequencing machines, namely, that they read out very few DNA letters at a time. These programs reconstruct genomes that are billions of letters long from the hundreds of letters per read that modern sequencers provide. What these programs do is often described as a scaled up version of a family solving a jigsaw puzzle.

The CABOG software was the first to accomplish many scientific goals. It was the first to assemble the genome of a multicellular organism (Drosophila melanogaster, 2000). It was the first to assemble both parental haplotypes of one human genome (J. Craig Venter, 2007). It was the first to assemble environmental sequence from the oceans (Sargasso Sea in 2004 and Global Ocean Sampling in 2007). It was first to combine reads from first-generation Sanger sequencing machines and second-generation pyrosequencing machines (Marine microbes, 2006). Today, CABOG is one of the leading assembly programs for data sets that include paired end data from the Roche 454 line of sequencing machines.

Algorithms Research & Software Development

The assembly team at JCVI is actively involved in research to expand the realm of what can be learned with modern DNA sequencing technology. The research is focused on extending and improving the CABOG software. The research led to recent publications of methods for combining Sanger and 454 data from the same genome (see 2006 and 2008 publications) and a method for combining improving a draft assembly with the automatic integration of finishing reads (see 2010 publication). Current research is focused on the integration of data from other sequencing technologies such as Illumina and Pacific Biosciences sequencers. CABOG is well suited for integration of all these technologies when they are applied to large and complex genomes such as those of mammals.

Recent Genome Projects

The JCVI assembly team works with collaborators around the world on important genomes such as:


 The National Institute of General Medical Sciences (NIGMS), part of the USA National Institutes of Health (NIH), provides critical funding for this research.

Research  >  CABOG  >  Overview