Rxivist logo

Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics

By Héléna A. Gaspar, Gerome Breen

Posted 04 Jul 2018
bioRxiv DOI: 10.1101/362343 (published DOI: 10.1186/s12859-019-2680-1)

Principal component analysis (PCA) is a standard method to correct for population stratification in ancestry-specific genome-wide association studies (GWASs) and is used to cluster individuals by ancestry. Using the 1000 genomes project data, we examine how non-linear dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE) or generative topographic mapping (GTM) can be used to provide improved ancestry maps by accounting for a higher percentage of explained variance in ancestry, and how they can help to estimate the number of principal components necessary to account for population stratification. GTM also generates posterior probabilities of class membership which can be used to assess the probability of an individual to belong to a given population - as opposed to t-SNE, GTM can be used for both clustering and classification. This paper is a first application of GTM for ancestry classification models. Our maps and software are available online.

Download data

  • Downloaded 734 times
  • Download rankings, all-time:
    • Site-wide: 19,814 out of 94,912
    • In bioinformatics: 2,906 out of 8,837
  • Year to date:
    • Site-wide: 39,920 out of 94,912
  • Since beginning of last month:
    • Site-wide: 29,474 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)