An open resource of structural variation for medical and population genetics
Ryan L. Collins,
Konrad J. Karczewski,
Laurent C Francioli,
Amit V Khera,
Laura D Gauthier,
Nicholas A Watts,
Anne H. O’Donnell-Luria,
Matthew R Stone,
Kristen M Laricchia,
Genome Aggregation Database Production Team,
Genome Aggregation Database Consortium,
Kent D. Taylor,
Henry J Lin,
Stephen S Rich,
Yii-Der Ida Chen,
Jerome I. Rotter,
Benjamin M. Neale,
Mark J. Daly,
Daniel G MacArthur,
Michael E. Talkowski
Posted 14 Mar 2019
bioRxiv DOI: 10.1101/578674 (published DOI: 10.1038/s41586-020-2287-8)
Posted 14 Mar 2019
Structural variants (SVs) rearrange large segments of the genome and can have profound consequences for evolution and human diseases. As national biobanks, disease association studies, and clinical genetic testing grow increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral for interpreting genetic variation. To date, no large-scale reference maps of SVs exist from high-coverage sequencing comparable to those available for point mutations in protein-coding genes. Here, we constructed a reference atlas of SVs across 14,891 genomes from diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 433,371 distinct SVs, including 5,295 multi-breakpoint complex SVs across 11 mutational subclasses, and examples of localized chromosome shattering, as in chromothripsis. The average individual harbored 7,439 SVs, which accounted for 25-29% of all rare protein-truncating events per genome. We found strong correlations between constraint against damaging point mutations and rare SVs that both disrupt and duplicate protein-coding sequence, suggesting intolerance to reciprocal dosage alterations for a subset of tightly regulated genes. We also uncovered modest selection against noncoding SVs in cis -regulatory elements, although selection against protein-truncating SVs was stronger than any effect on noncoding SVs. Finally, we benchmarked carrier rates for medically relevant SVs, finding very large (≥1Mb) rare SVs in 3.8% of genomes (~1:26 individuals) and clinically reportable incidental SVs in 0.18% of genomes (~1:556 individuals). These data have been integrated directly into the gnomAD browser (<https://gnomad.broadinstitute.org>) and will have broad utility for population genetics, disease association, and diagnostic screening.
- Downloaded 10,371 times
- Download rankings, all-time:
- Site-wide: 238 out of 93,322
- In genomics: 48 out of 5,875
- Year to date:
- Site-wide: 597 out of 93,322
- Since beginning of last month:
- Site-wide: 2,706 out of 93,322
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!