
Genome-wide prediction and analysis of coding variants
Overview
The identification of genetic variants that are implicated in disease is an important step in linking sequence data with new approaches to improve human health. Among the sequence variants currently known to be directly linked with human Mendelian disease, 57% are due to nonsynonymous mutations that encode a single amino acid substitution in the corresponding protein. An additional 23% of disease variants are due to small insertions and deletions (indels) in genes. Therefore, an important problem in human health is the identification of coding variants, SNPs and indels, which affect protein function and might be involved with disease. To this end, we developed SIFT, an algorithm that predicts if an amino acid substitution affects protein function. This algorithm, available at the SIFT website, is widely used by the research community and is often used as a benchmark for similar prediction algorithms.
The popularity of SIFT and other similar tools emphasizes the need to analyze coding variants and prioritize which amongst them are most likely to have a phenotypic effect. Moreover, large numbers of variants, including SNPs and indels, are being generated by advances in DNA sequencing technologies and they will require analysis. We are enhancing the ability of SIFT to perform large-scale analysis for coding variants. These new features will be incorporated into the SIFT web server to enable genome-wide analysis. Executables and code will also be made freely available to the research community.
Funding
NIH / National Human Genome Research Institute