Rxivist logo

Evaluating approaches to find exon chains corresponding to long reads

By Anna Kuosmanen, Veli Mäkinen

Posted 27 Jul 2016
bioRxiv DOI: 10.1101/066241

Transcript prediction can be modelled as a graph problem where exons are modelled as nodes and reads spanning two or more exons are modelled as exon chains. PacBio third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions. We survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity / precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy.

Download data

  • Downloaded 322 times
  • Download rankings, all-time:
    • Site-wide: 59,422 out of 103,802
    • In bioinformatics: 6,495 out of 9,474
  • Year to date:
    • Site-wide: 93,816 out of 103,802
  • Since beginning of last month:
    • Site-wide: 73,561 out of 103,802

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)