Rxivist logo

Haplotype Threading: Accurate Polyploid Phasing from Long Reads

By Sven D. Schrinner, Rebecca Serra Mari, Jana Ebler, Mikko Rautiainen, Lancelot Seillier, Julia J. Reimer, Björn Usadel, Tobias Marschall, Gunnar W. Klau

Posted 04 Feb 2020
bioRxiv DOI: 10.1101/2020.02.04.933523 (published DOI: 10.1186/s13059-020-02158-1)

Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. As a highly complex computational problem, polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes. We present WhatsHap polyphase, a novel two-stage approach that addresses these challenges by (i) clustering reads using a position-dependent scoring function and (ii) threading the haplotypes through the clusters by dynamic programming. We demonstrate on a simulated data set that this results in accurate haplotypes with switch error rates that are around three times lower than those obtainable by the current state-of-the-art and even around seven times lower in regions of collapsing haplotypes. Using a real data set comprising long and short read tetraploid potato sequencing data we show that WhatsHap polyphase is able to phase the majority of the potato genes after error correction, which enables the assembly of local genomic regions of interest at haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap and ready to be included in production settings.

Download data

  • Downloaded 1,136 times
  • Download rankings, all-time:
    • Site-wide: 11,262 out of 103,705
    • In bioinformatics: 1,794 out of 9,474
  • Year to date:
    • Site-wide: 2,363 out of 103,705
  • Since beginning of last month:
    • Site-wide: 9,621 out of 103,705

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)