Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,106 bioRxiv papers from 288,545 authors.

MicrobiomeGWAS: a tool for identifying host genetic variants associated with microbiome composition

By Xing Hua, Lei Song, Guoqin Yu, James J. Goedert, Christian C. Abnet, Maria Teresa Landi, Jianxin Shi

Posted 10 Nov 2015
bioRxiv DOI: 10.1101/031187

The microbiome is the collection of all microbial genes and can be investigated by sequencing highly variable regions of 16S ribosomal RNA (rRNA) genes. Evidence suggests that environmental factors and host genetics may interact to impact human microbiome composition. Identifying host genetic variants associated with human microbiome composition not only provides clues for characterizing microbiome variation but also helps to elucidate biological mechanisms of genetic associations, prioritize genetic variants, and improve genetic risk prediction. Since a microbiota functions as a community, it is best characterized by beta diversity, that is, a pairwise distance matrix. We develop a statistical framework and a computationally efficient software package, microbiomeGWAS, for identifying host genetic variants associated with microbiome beta diversity with or without interacting with an environmental factor. We show that score statistics have positive skewness and kurtosis due to the dependent nature of the pairwise data, which makes P-value approximations based on asymptotic distributions unacceptably liberal. By correcting for skewness and kurtosis, we develop accurate P-value approximations, whose accuracy was verified by extensive simulations. We exemplify our methods by analyzing a set of 147 genotyped subjects with 16S rRNA microbiome profiles from non-malignant lung tissues. Correcting for skewness and kurtosis eliminated the dramatic deviation in the quantile-quantile plots. We provided preliminary evidence that six established lung cancer risk SNPs were collectively associated with microbiome composition for both unweighted (P=0.0032) and weighted (P=0.011) UniFrac distance matrices. In summary, our methods will facilitate analyzing large-scale genome-wide association studies of the human microbiome.

Download data

  • Downloaded 1,135 times
  • Download rankings, all-time:
    • Site-wide: 6,058 out of 65,106
    • In bioinformatics: 1,151 out of 6,446
  • Year to date:
    • Site-wide: 24,058 out of 65,106
  • Since beginning of last month:
    • Site-wide: 12,398 out of 65,106

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News