Rxivist logo

Population Stratification at the Phenotypic Variance level and Implication for the Analysis of Whole Genome Sequencing Data from Multiple Studies

By Tamar Sofer, Xiuwen Zheng, Cecelia A Laurie, Stephanie M Gogarten, Jennifer A. Brody, Matthew P. Conomos, Joshua C Bis, Timothy A. Thornton, Adam Szpiro, Jeffrey R O’Connell, Ethan M Lange, Yan Gao, L. Adrienne Cupples, Bruce M Psaty, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Kenneth M Rice

Posted 05 Mar 2020
bioRxiv DOI: 10.1101/2020.03.03.973420

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We also illustrate the variance stratification problem, its solutions, and a corresponding diagnostic procedure in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.

Download data

  • Downloaded 190 times
  • Download rankings, all-time:
    • Site-wide: 103,256
    • In genetics: 4,633
  • Year to date:
    • Site-wide: 103,501
  • Since beginning of last month:
    • Site-wide: 103,501

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)