Pang, A. W., Macdonald, J. R., Pinto, D., Wei, J., Rafiq, M. A., Conrad, D., Park, H., Hurles, M., Lee, C., Venter, J. C., Kirkness, E., Levy, S., Feuk, L., Scherer, S. W.
Towards a Comprehensive Structural Variation Map of an Individual Human Genome
Genome Biol. 2010 May 19; 11(5): R52.
ABSTRACT: BACKGROUND: Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10bp) insertion/deletions (indels), the annotation of larger structural variants (SVs) has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNVs) and inversions. RESULTS: We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by ~1.2% when considering indels/CNVs, 0.1% by SNPs and ~0.3% by inversions. The structural variants impact 4,867 genes, and >24% of SVs would not be imputed by SNP-association. CONCLUSIONS: Our results indicate that a large number of SVs have been unreported in the individual genomes published to date. This significant extent and complexity of SV, as well as the growing recognition of its medical relevance, necessitate it be actively studied in health-related analyses of personal genomes. The new catalogue of SV generated for this genome provides a crucial resource for future comparison studies.