Rxivist logo

UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular Identifiers

By Serghei Mangul, Sarah Van Driesche, Lana S. Martin, Kelsey C Martin, Eleazar Eskin

Posted 25 Jan 2017
bioRxiv DOI: 10.1101/103267

Every sequencing library contains duplicate reads. While many duplicates arise during polymerase chain reaction (PCR), some duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed "biological duplicates"). Unique Molecular Identifiers (UMIs) are random oligonucleotide sequences that allow differentiation between technical and biological duplicates. Here we report the development of UMI-Reducer, a new computational tool for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically require additional cycles of PCR and therefore are most prone to PCR duplicate bias.

Download data

  • Downloaded 2,587 times
  • Download rankings, all-time:
    • Site-wide: 4,791
    • In bioinformatics: 532
  • Year to date:
    • Site-wide: 37,052
  • Since beginning of last month:
    • Site-wide: 23,917

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)