Rxivist logo

Genotyping of structural variation using PacBio high-fidelity sequencing

By Zhiliang Zhang, Jijin Zhang, Lipeng Kang, Xuebing Qiu, Beirui Niu, Aoyue Bi, Xuebo Zhao, Daxing Xu, Jing Wang, Changbin Yin, Xiangdong Fu, Fei Lu

Posted 31 Oct 2021
bioRxiv DOI: 10.1101/2021.10.28.466362

Background: Structural variations (SVs) pervade the genome and contribute substantially to the phenotypic diversity of species. However, most SVs were ineffectively assayed because of the complexity of plant genomes and the limitations of sequencing technologies. Recent advancement of third-generation sequencing technologies, particularly the PacBio high-fidelity (HiFi) sequencing, which generates both long and highly accurate reads, offers an unprecedented opportunity to characterize SVs and reveal their functionality. Since HiFi sequencing is new, it is crucial to evaluate HiFi reads in SV detection before applying the technology at scale. Results: We sequenced wheat genomes using HiFi, then conducted a comprehensive evaluation of SV detection using mainstream long-read aligners and SV callers. The results showed the accuracy of SV discovery depends more on aligners rather than callers. For aligners, pbmm2 and NGMLR provided the most accurate results while detecting deletion and insertion, respectively. Likewise, cuteSV and SVIM achieved the best performance across all SV callers. We demonstrated that the combination of the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the impact of sequencing depth on the accuracy of SV detection. The results showed that low-coverage HiFi sequencing is capable of generating high-quality SV genotyping. Conclusions: This study provides a robust benchmark of SV discovery with HiFi reads, showing the remarkable potential of long-read sequencing to investigate structural variations in plant genomes. The high accuracy SV discovery from low-coverage HiFi sequencing indicates that skim HiFi sequencing is an ideal approach to study structural variations at the population level.

Download data

  • Downloaded 443 times
  • Download rankings, all-time:
    • Site-wide: 95,529
    • In bioinformatics: 9,405
  • Year to date:
    • Site-wide: 17,435
  • Since beginning of last month:
    • Site-wide: 21,445

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide