Rxivist logo

Robust Benchmark Structural Variant Calls of An Asian Using the State-of-Art Long Fragment Sequencing Technologies

By Xiao Du, Lili Li, Fan Liang, Sanyang Liu, Wenxin Zhang, Shuai Sun, Yuhui Sun, Fei Fan, Linying Wang, Xinming Liang, Weijin Qiu, Guangyi Fan, Ou Wang, Weifei Yang, Jiezhong Zhang, Yuhui Xiao, Yang Wang, Depeng Wang, Shoufang Qu, Fang Chen, Jie Huang

Posted 12 Aug 2020
bioRxiv DOI: 10.1101/2020.08.10.245308

The importance of structural variants (SVs) on phenotypes and human diseases is now recognized. Although a variety of SV detection platforms and strategies that vary in sensitivity and specificity have been developed, few benchmarking procedures are available to confidently assess their performances in biological and clinical research. To facilitate the validation and application of those approaches, our work established an Asian reference material comprising identified benchmark regions and high-confidence SV calls. We established a high-confidence SV callset with 8,938 SVs in an EBV immortalized B lymphocyte line, by integrating four alignment-based SV callers [from 109x PacBio continuous long read (CLR), 22x PacBio circular consensus sequencing (CCS) reads, 104x Oxford Nanopore long reads, and 114x optical mapping platform (Bionano)] and one de novo assembly-based SV caller using CCS reads. A total of 544 randomly selected SVs were validated by PCR and Sanger sequencing, proofing the robustness of our SV calls. Combining trio-binning based haplotype assemblies, we established an SV benchmark for identification of false negatives and false positives by constructing the continuous high confident regions (CHCRs), which cover 1.46Gb and 6,882 SVs supported by at least one diploid haplotype assembly. Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical diagnosis. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 699 times
  • Download rankings, all-time:
    • Site-wide: 55,072
    • In genomics: 4,020
  • Year to date:
    • Site-wide: 64,375
  • Since beginning of last month:
    • Site-wide: 49,274

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide