Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,762 bioRxiv papers from 308,879 authors.

Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics

By Héléna A Gaspar, Gerome Breen

Posted 04 Jul 2018
bioRxiv DOI: 10.1101/362343 (published DOI: 10.1186/s12859-019-2680-1)

Principal component analysis (PCA) is a standard method to correct for population stratification in ancestry-specific genome-wide association studies (GWASs) and is used to cluster individuals by ancestry. Using the 1000 genomes project data, we examine how non-linear dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE) or generative topographic mapping (GTM) can be used to provide improved ancestry maps by accounting for a higher percentage of explained variance in ancestry, and how they can help to estimate the number of principal components necessary to account for population stratification. GTM also generates posterior probabilities of class membership which can be used to assess the probability of an individual to belong to a given population - as opposed to t-SNE, GTM can be used for both clustering and classification. This paper is a first application of GTM for ancestry classification models. Our maps and software are available online.

Download data

  • Downloaded 620 times
  • Download rankings, all-time:
    • Site-wide: 17,108 out of 70,836
    • In bioinformatics: 2,615 out of 6,926
  • Year to date:
    • Site-wide: 32,168 out of 70,836
  • Since beginning of last month:
    • Site-wide: 25,782 out of 70,836

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News