Subtle stratification confounds estimates of heritability from rare variants
By
Gaurav Bhatia,
Alexander Gusev,
Po-Ru Loh,
Hilary Finucane,
Bjarni J. Vilhjálmsson,
Stephan Ripke,
Schizophrenia Working Group of the Psychiatric Genomics Consortium,
Shaun Purcell,
Eli Stahl,
Mark Daly,
Teresa R de Candia,
Sang Hong Lee,
Benjamin M Neale,
Matthew C. Keller,
Noah A. Zaitlen,
Bogdan Pasaniuc,
Nick Patterson,
Jian Yang,
Alkes L. Price
Posted 13 Apr 2016
bioRxiv DOI: 10.1101/048181
Genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2). While considerably more heritability is explained by all genotyped SNPs (hg2), for most traits, much heritability remains missing (hg2 < h2). Rare variants, poorly tagged by genotyped SNPs, are a major potential source of the gap between hg2 and h2. Recent efforts to assess the contribution of both sequenced and imputed rare variants to phenotypes suggest that substantial heritability may lie in these variants. Here we analyze sequenced SNPs, imputed SNPs and haploSNPs (haplotype variants constructed from within a sample, without using a reference panel) and show that studies of heritability from these variants may be strongly confounded by subtle population stratification. For example, when meta-analyzing heritability estimates from 22 randomly ascertained case-control traits from the GERA cohort, we observe a statistically significant increase in heritability explained by imputed SNPs even after correcting for principal components (PCs) from genotyped (or imputed) SNPs. However, this increase is eliminated when correcting for stratification using PCs from a larger number of haploSNPs. We note that subtle stratification may also impact estimates of heritability from array SNPs, although we find that this is generally a less severe problem. Overall, our results suggest that estimating the heritability explained by rare variants for case-control traits requires exquisite control for population stratification, but current methods may not provide this level of control.
Download data
- Downloaded 1,160 times
- Download rankings, all-time:
- Site-wide: 16,833
- In genetics: 826
- Year to date:
- Site-wide: 73,006
- Since beginning of last month:
- Site-wide: 61,548
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!