Rxivist logo

Population Structure, Stratification and Introgression of Human Structural Variation

By Mohamed A. Almarri, Anders Bergström, Javier Prado-Martinez, Fengtang Yang, Beiyuan Fu, Alistair S Dunham, Yuan Chen, Matthew E Hurles, Chris Tyler-Smith, Yali Xue

Posted 28 Aug 2019
bioRxiv DOI: 10.1101/746172 (published DOI: 10.1016/j.cell.2020.05.024)

Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, yet are still understudied. Here, we present a comprehensive analysis of deletions, duplications, insertions, inversions and non-reference unique insertions in the Human Genome Diversity Project (HGDP-CEPH) panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify in total 126,018 structural variants (25,588 <100 bp in size), of which 78% are novel. Some reach high frequency and are private to continental groups or even individual populations, including a deletion in the maltase-glucoamylase gene MGAM involved in starch digestion, in the South American Karitiana and a deletion in the Central African Mbuti in SIGLEC5, potentially leading to immune hyperactivity. We discover a dynamic range of copy number expansions and find cases of regionally-restricted runaway duplications, for example, 18 copies near the olfactory receptor OR7D2 in East Asia and in the clinically-relevant HCAR2 in Central Asia. We identify highly-stratified putatively introgressed variants from Neanderthals or Denisovans, some of which, like a deletion within AQR in Papuans, are almost fixed in individual populations. Finally, by de novo assembly of 25 genomes using linked-read sequencing we discover 1631 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. These insertions show population structure and some reside in functional regions, illustrating the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.

Download data

  • Downloaded 1,119 times
  • Download rankings, all-time:
    • Site-wide: 16,732
    • In genomics: 1,749
  • Year to date:
    • Site-wide: 48,394
  • Since beginning of last month:
    • Site-wide: 65,348

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News