Haplotype Threading: Accurate Polyploid Phasing from Long Reads
Sven D. Schrinner,
Rebecca Serra Mari,
Julia J. Reimer,
Gunnar W. Klau
Posted 04 Feb 2020
bioRxiv DOI: 10.1101/2020.02.04.933523 (published DOI: 10.1186/s13059-020-02158-1)
Posted 04 Feb 2020
Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. As a highly complex computational problem, polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes. We present WhatsHap polyphase, a novel two-stage approach that addresses these challenges by (i) clustering reads using a position-dependent scoring function and (ii) threading the haplotypes through the clusters by dynamic programming. We demonstrate on a simulated data set that this results in accurate haplotypes with switch error rates that are around three times lower than those obtainable by the current state-of-the-art and even around seven times lower in regions of collapsing haplotypes. Using a real data set comprising long and short read tetraploid potato sequencing data we show that WhatsHap polyphase is able to phase the majority of the potato genes after error correction, which enables the assembly of local genomic regions of interest at haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap and ready to be included in production settings.
- Downloaded 1,136 times
- Download rankings, all-time:
- Site-wide: 11,262 out of 103,705
- In bioinformatics: 1,794 out of 9,474
- Year to date:
- Site-wide: 2,363 out of 103,705
- Since beginning of last month:
- Site-wide: 9,621 out of 103,705
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!