A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform
Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.