You are here


Derek Harkins is a senior bioinformatics analyst in the Infectious Disease Group at the J. Craig Venter Institute. Mr. Harkins' primary research interests are in metagenomics, functional and structural annotation, automated annotation, microbial genome sequencing, comparative genomics and research and development of bioinformatic tools and resources. Prior to joining JCVI, Mr. Harkins served as a consultant for the Genetics Research Group of International Paper Company doing marker-aided selection. Previously he was a Biologist with the USDA-Forest Service and led a team working on genetic mapping of white pine blister rust resistance genes in white pines. Prior to that he was a Biologist with the USDA-Agricultural Research Service doing population genetics and disease resistance research. Mr. Harkins received his MS in forestry (molecular biology) and genetics from North Carolina State University, and a BS in biology from the University of Miami.

Research Priorities

Metagenomics of the Microbiome in Oral Health and Disease

  • Role: Lead Bioinformaticist
  • Analysis of a large-scale metagnomics project related to association and longitudinal studies of the oral microbiome and the dental caries model of disease.

The Development and Validation of Sequence Subtraction Databases to Improve Virus Discovery Through Next Generation Sequencing

  • Role: Co-investigator/Bioinformaticist
  • Analysis and informatic support for the DHS project.

The J. Craig Venter Institute Genome Center for Infectious Diseases (GCID)

  • Role: Bioinformaticist/Team lead for section 3.1 of the AMR (Anti-Microbial Resistance) supplement

Research and Development of informatics tools and resources.

  • Software development and validation of institutional tools and pipelines used in metagenomic research.


Select Publications

Gastrointestinal microbial populations can distinguish pediatric and adolescent Acute Lymphoblastic Leukemia (ALL) at the time of disease diagnosis.
BMC genomics. 2016-08-15; 17.1: 635.
PMID: 27527070
Stool microbiota composition is associated with the prospective risk of Plasmodium falciparum infection.
BMC genomics. 2015-08-22; 16.631.
PMID: 26296559
A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii.
Genome biology. 2015-07-21; 16.143.
PMID: 26195261
Draft Genome Sequence of Enterococcus faecium PC4.1, a Clade B Strain Isolated from Human Feces.
Genome announcements. 2014-02-06; 2.1:
PMID: 24503986
New insights into dissemination and variation of the health care-associated pathogen Acinetobacter baumannii from genomic analysis.
mBio. 2014-01-21; 5.1: e00963-13.
PMID: 24449752
Draft Genome Sequence of Enterococcus faecalis PC1.1, a Candidate Probiotic Strain Isolated from Human Feces.
Genome announcements. 2013-01-01; 1.1:
PMID: 23469340
TIGRFAMs and Genome Properties in 2013.
Nucleic acids research. 2013-01-01; 41.Database issue: D387-95.
PMID: 23197656
Pathogenomic inference of virulence-associated genes in Leptospira interrogans.
PLoS neglected tropical diseases. 2013-01-01; 7.10: e2468.
PMID: 24098822
Draft genome sequence determination for cystic fibrosis and chronic granulomatous disease Burkholderia multivorans isolates.
Journal of bacteriology. 2012-11-01; 194.22: 6356-7.
PMID: 23105085
Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.
PLoS neglected tropical diseases. 2012-01-01; 6.10: e1853.
PMID: 23145189
CharProtDB: a database of experimentally characterized protein annotations.
Nucleic acids research. 2012-01-01; 40.Database issue: D237-41.
PMID: 22140108
Draft genome sequence of Bacteroides vulgatus PC510, a strain isolated from human feces.
Journal of bacteriology. 2011-08-01; 193.16: 4025-6.
PMID: 21622758
Draft genome sequence of Turicibacter sanguinis PC909, isolated from human feces.
Journal of bacteriology. 2011-03-01; 193.6: 1288-9.
PMID: 21183674
Complete genome sequence of the multiresistant taxonomic outlier Pseudomonas aeruginosa PA7.
PloS one. 2010-01-22; 5.1: e8842.
PMID: 20107499
The Protein Naming Utility: a rules database for protein nomenclature.
Nucleic acids research. 2010-01-01; 38.Database issue: D336-9.
PMID: 20007151
Insights into the environmental resistance gene pool from the genome sequence of the multidrug-resistant environmental isolate Escherichia coli SMS-3-5.
Journal of bacteriology. 2008-10-01; 190.20: 6779-94.
PMID: 18708504