PAST PROJECT

Distribution of Pre-configured Data Analysis Pipelines and Viral Genome Sequences from GSC Projects Through Virtual Machine Servers on the Cloud

The proposed project aims to leverage cloud computing technology in order to democratize access to computational resources required for bioinformatic analysis of next-generation sequencing data. Utilization of data released from Genome Sequencing Centers is difficult for smaller laboratories and institutes, since they usually do not have personnel with the technical expertise to install bioinformatic pipelines, or access to computing clusters required for large-scale data analysis. The goal of this project is to deliver these resources to the genomics community, by distributing pre-configured and ready to execute viral genomic pipelines on Virtual Machine (VM) servers on the cloud.

The VM servers will be available on the Amazon EC2 cloud service, which provides publicly accessible, high performance computing clusters that can be rented on-demand at low cost, while it was developed on and fully supports the Eucalyptus cloud platform. This will allow researchers in laboratories worldwide and independently of institutional, economic or national boundaries to perform large scale viral sequence assembly, analysis and annotation. Users will simply need a web browser to access the cloud interface for starting the VM servers, and then run the pipelines with the viral data we will make available on the cloud or by uploading sequences generated at their laboratories. A cloud of small computational capacity will also be publicly available by JCVI at no charge for the duration of the project, allowing researchers to test the tools in a cloud environment.

The VM servers will contain pre-configured viral pipelines and all software dependencies for their execution, including the operating system, code libraries and configuration files. Researchers with access to local computing clusters will also have the option to simply download and run the VM servers in a private cloud, without need to perform any software installation for setting up the pipelines.

Funding

This project has been funded in whole or part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract numbers N01-AI30071 and/or HHSN272200900007C.

Distribution of Pre-configured Data Analysis Pipelines and Viral Genome Sequences from GSC Projects Through Virtual Machine Servers on the Cloud

Funding

Collaborators

Presentations

Related Research

In the News

Genomics pioneer J. Craig Venter launches Diploid Genomics, Inc. (DGI), ushering in a new era in human genomics

What's Happening

Reading the blueprint of life

Recently Published

Salivary Proteome Role in Infection and Immunity.