Population Structure, Stratification and Introgression of Human Structural Variation
Mohamed A. Almarri,
Alistair S Dunham,
Matthew E Hurles,
Posted 28 Aug 2019
bioRxiv DOI: 10.1101/746172 (published DOI: 10.1016/j.cell.2020.05.024)
Posted 28 Aug 2019
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, yet are still understudied. Here, we present a comprehensive analysis of deletions, duplications, insertions, inversions and non-reference unique insertions in the Human Genome Diversity Project (HGDP-CEPH) panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify in total 126,018 structural variants (25,588 <100 bp in size), of which 78% are novel. Some reach high frequency and are private to continental groups or even individual populations, including a deletion in the maltase-glucoamylase gene MGAM involved in starch digestion, in the South American Karitiana and a deletion in the Central African Mbuti in SIGLEC5, potentially leading to immune hyperactivity. We discover a dynamic range of copy number expansions and find cases of regionally-restricted runaway duplications, for example, 18 copies near the olfactory receptor OR7D2 in East Asia and in the clinically-relevant HCAR2 in Central Asia. We identify highly-stratified putatively introgressed variants from Neanderthals or Denisovans, some of which, like a deletion within AQR in Papuans, are almost fixed in individual populations. Finally, by de novo assembly of 25 genomes using linked-read sequencing we discover 1631 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. These insertions show population structure and some reside in functional regions, illustrating the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
- Downloaded 1,119 times
- Download rankings, all-time:
- Site-wide: 16,732
- In genomics: 1,749
- Year to date:
- Site-wide: 48,394
- Since beginning of last month:
- Site-wide: 65,348
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!