JCVI: Research / Projects / JCVI Cloud BioLinux / White Paper
 
 
Section Banner

White Paper

Cloud BioLinux: pre-configured and on-demand high performance computing for the genomics community.

While some mainstream bioinformatics tools such as BLAST for genomic sequence analysis, are accessible online through NCBI's website (1), most software packages need to be downloaded and configured by each individual researcher. Commonly-used bioinformatics tools are hard to build and maintain, usually available as source code and with numerous software library dependencies. This creates a burden especially for scientists working in smaller institutions and laboratories, which lack advanced information technology support. In order to use software developed as part of funded research, these researchers need to provision their own infrastructure and technical expertise. In addition, many bioinformatics workflows involve large datasets and for the genomic analysis to be completed successfully, high performance and specialized computing hardware is required.

A tool developed specifically for alleviating these computational bottlenecks in research is JCVI's Cloud BioLinux (2), a publicly available virtual machine that runs on cloud computing platforms. Through Cloud BioLinux, JCVI offers to the bioinformatics community a computing solution using a new Science as a Service (ScaaS) model, based on the principles of Software as a Service (SaaS). With this approach bioinformatics software tools are pre-installed and configured within the virtual machine, and scientists can utilize them on cloud platforms such as Amazon's EC2 (3) or DOE's planned public cloud (4). Through Cloud BioLinux, JCVI can provide equal access to next-generation bioinformatics for the genomics community, by offering pre-configured and on-demand high performance computing on cloud computing platforms.

Cloud BioLinux was based on BioLinux 5.0 (5), and provides more than 100 bioinformatics packages. Upon deployment users have access to a host of software including for example BLAST, Glimmer, HMMER, Phylip, RasMol, Genespring, Clustalw, and the EMBOSS collection of utilities. In addition, we installed the Celera Genome Assembler, and the parallel computing implementation of the BLAST sequence comparison tool (mpiBLAST). Within the virtual machine we also included scripts that allow for push-button creation of parallel computing clusters on cloud platforms, intended for scientists that need to perform large scale genomic sequence analysis.

Both the official Amazon Web Services blog, and the Amazon Developer's newsletter have featured JCVI's Cloud BioLinux (6). Though this exposure, a community of genomics investigators has been already established around this virtual machine (7), and agreed to use it as basis for porting more bioinformatics software on cloud computing platforms.

A proposal for a Cloud Biolinux developers' workshop has been accepted as part the ISMB/BOSC 2010 conference (8,9). The first goal during the workshop, is to establish a configuration and software management framework for bioinformatics software included with Cloud Biolinux. This will provide more flexibility to researchers, to customize the virtual machine for their research specific needs and port their software on cloud computing platforms. A second goal during the workshop, is to make Cloud BioLinux more accessible to end-users and biologists, who are not comfortable interacting with Unix servers. Towards this, we plan to install and test remote desktop tools such as FreeNX or VNC. These allow for easy access to applications running on virtual machines and cloud platforms, and provide identical experience with that of using a desktop computer. Finally, performance of bioinformatics applications on compute clouds, using new clustering and grid technologies such as MPI, Condor and XtreemOS will be evaluated.

References

(1) http://www.ncbi.nlm.nih.gov/BLAST

(2) http://www.jcvi.org/cms/research/projects/jcvi-cloud-biolinux/overview/

(3) http://aws.amazon.com/ec2

(4) http://newscenter.lbl.gov/press-releases/2009/10/14/scientific-cloud-computing/

(5) http://envgen.nox.ac.uk/tools/bio-linux

(6) http://aws.typepad.com/aws/2009/09/bioinformatics-genomes-ec2-and-hadoop.html

(7) http://lists.cloudbiolinux.com/pipermail/community/2010-January/thread.html

(8) http://www.open-bio.org/wiki/Codefest_2010

(9) http://www.open-bio.org/wiki/BOSC_2010

Contact Cloud Team

Email: cloud(AT)jcvi.org

Feedback Forum