Rxivist logo

FUN-LDA: A latent Dirichlet allocation model for predicting tissue-specific functional effects of noncoding variation

By Daniel Backenroth, Zihuai He, Krzysztof Kiryluk, Valentina Boeva, Lynn Pethukova, Ekta Khurana, Angela Christiano, Joseph D Buxbaum, Iuliana Ionita-Laza

Posted 11 Aug 2016
bioRxiv DOI: 10.1101/069229

We describe here a new method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell type and tissue specific way (FUN-LDA) by integrating diverse epigenetic annotations for specific cell types and tissues from large scale epigenomics projects such as ENCODE and Roadmap Epigenomics. Using this unsupervised approach we predict tissue-specific functional effects for every position in the human genome. We demonstrate the usefulness of our predictions using several validation experiments. Using eQTL data from several sources, including the Genotype-Tissue Expression project, the Geuvadis project and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used to derive the most likely cell/tissue type causally implicated for a complex trait using summary statistics from genome-wide association studies, and estimate a tissue-based correlation matrix of various complex traits. We find large enrichment of heritability in functional components of relevant tissues for various complex traits, with FUN-LDA yielding the highest enrichment estimates relative to existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA to state-of-the-art functional annotation methods such as GenoSkyline, ChromHMM, Segway, and IDEAS, and show that FUN-LDA has better prediction accuracy and higher resolution compared to these methods. In summary, we describe a new approach and perform rigorous comparisons with the most commonly used functional annotation methods, providing a valuable resource for the community interested in the functional annotation of noncoding variants. Scores for each position in the human genome and for each ENCODE/Roadmap tissue are available from http://www.columbia.edu/~ii2135/funlda.html.

Download data

  • Downloaded 1,386 times
  • Download rankings, all-time:
    • Site-wide: 12,044
    • In bioinformatics: 1,491
  • Year to date:
    • Site-wide: 58,931
  • Since beginning of last month:
    • Site-wide: 97,922

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)