GGRaSP (Gaussian Genome Representative Selector with Prioritization) is an R-package that generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. GGRaSP also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian Mixture Model to select an appropriate cluster threshold, thus allowing for both generalizable high-throughput and more dataset specific use.
- Rapidly simplify large datasets containing up to multiple thousands of genomes.
- Optional run without any a priori knowledge of the shape of the data.
- Generation of images, tables, and annotation files enabling detailed analysis of the phylogeny and GGRaSP clusters.
This project has been funded in whole or part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under Award Number U19AI110819.