Rxivist logo

Discovery of large genomic inversions using pooled clone sequencing

By Marzieh Eslami Rasekh, Giorgia Chiatante, Mattia Miroballo, Joyce Tang, Mario Ventura, Chris T Amemiya, Evan E. Eichler, Can Alkan

Posted 11 Feb 2015
bioRxiv DOI: 10.1101/015156 (published DOI: 10.1186/s12864-016-3444-1)

There are many different forms of genomic structural variation that can be broadly classified as copy number variation (CNV) and balanced rearrangements. Although many algorithms are now available in the literature that aim to characterize CNVs, discovery of balanced rearrangements (inversions and translocations) remains an open problem. This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, which reduce the mappability of short reads. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS). Here we propose to use a sequencing method (Kitzman et al., 2011) originally developed to improve haplotype resolution to characterize large genomic inversions. This method, called pooled clone sequencing, merges the advantages of clone based sequencing approach with the speed and cost efficiency of HTS technologies. Using data generated with pooled clone sequencing method, we developed a novel algorithm, dipSeq, to discover large inversions (>500 Kbp). We show the power of dipSeq first on simulated data, and then apply it to the genome of a HapMap individual (NA12878). We were able to accurately discover all previously known and experimentally validated large inversions in the same genome. We also identified a novel inversion, and confirmed using fluorescent in situ hybridization. Availability: Implementation of the dipSeq algorithm is available at https://github.com/BilkentCompGen/dipseq

Download data

  • Downloaded 1,384 times
  • Download rankings, all-time:
    • Site-wide: 8,086 out of 103,749
    • In bioinformatics: 1,363 out of 9,474
  • Year to date:
    • Site-wide: 87,784 out of 103,749
  • Since beginning of last month:
    • Site-wide: 93,164 out of 103,749

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)