Imputation aware tag SNP selection to improve power for multi-ethnic association studies
Genevieve L. Wojcik,
Alicia R. Martin,
Christopher S Carlson,
Hyun Min Kang,
Carlos D. Bustamante,
Christopher R Gignoux,
Eimear E Kenny
Posted 03 Feb 2017
bioRxiv DOI: 10.1101/105551 (published DOI: 10.1534/g3.118.200502)
Posted 03 Feb 2017
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms (SNPs) to improve rare variant imputation. Selection of these tag SNPs poses several challenges as rare variants tend to be continentally- or even population-specific and reflect fine-scale linkage disequilibrium (LD) structure impacted by recent demographic events. To explore the landscape of tag-able variation and guide design considerations for large-cohort and biobank arrays, we developed a novel pipeline to select tag SNPs using the 26 population reference panel from Phase 3 of the 1000 Genomes Project. We evaluate our approach using leave-one-out internal validation via standard imputation methods that allows the direct comparison of tag SNP performance by estimating the correlation of the imputed and real genotypes for each iteration of potential array sites. We show how this approach allows for an assessment of array design and performance that can take advantage of the development of deeper and more diverse sequenced reference panels. We quantify the impact of demography on tag SNP performance across populations and provide population-specific guidelines for tag SNP selection. We also examine array design strategies that target single populations versus multi-ethnic cohorts, and demonstrate a boost in performance for the latter can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Finally, we demonstrate the utility of improved array design to provide meaningful improvements in power, particularly in trans-ethnic studies. The unified framework presented will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
- Downloaded 791 times
- Download rankings, all-time:
- Site-wide: 23,970 out of 118,387
- In genomics: 2,372 out of 6,435
- Year to date:
- Site-wide: 76,249 out of 118,387
- Since beginning of last month:
- Site-wide: 65,541 out of 118,387
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!