Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads
By
Mitchell R. Vollger,
Glennis A. Logsdon,
Peter A. Audano,
Arvis Sulovari,
David Porubsky,
Paul Peluso,
Aaron M. Wenger,
Gregory T. Concepcion,
Zev N. Kronenberg,
Katherine M. Munson,
Carl Baker,
Ashley D. Sanders,
Diana Spierings,
Peter M. Lansdorp,
Urvashi Surti,
Michael W Hunkapiller,
Evan E Eichler
Posted 10 May 2019
bioRxiv DOI: 10.1101/635037
(published DOI: 10.1111/ahg.12364)
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.
Download data
- Downloaded 3,919 times
- Download rankings, all-time:
- Site-wide: 2,815
- In genomics: 315
- Year to date:
- Site-wide: 4,002
- Since beginning of last month:
- Site-wide: 4,478
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!