Rxivist logo

Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities that does not require raw genotype data, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype -- the proportion of SNPs (after uniform pruning, so that large LD blocks are not over-represented) likely to be in strong LD with causal/mechanistically associated SNPs -- and predicting the proportion of chip heritability explainable by genome wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and additionally, for purposes of illustration, putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We estimate the degree to which effect sizes are over-estimated when based on linear regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106 and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

Download data

  • Downloaded 980 times
  • Download rankings, all-time:
    • Site-wide: 13,615 out of 100,715
    • In genetics: 871 out of 5,015
  • Year to date:
    • Site-wide: 74,917 out of 100,715
  • Since beginning of last month:
    • Site-wide: None out of 100,715

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News

  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!