Rxivist logo

Accurate Determination of Bacterial Abundances in Human Metagenomes Using Full-length 16S Sequencing Reads

By Fanny Perraudeau, Sandrine Dudoit, James H. Bullard

Posted 04 Dec 2017
bioRxiv DOI: 10.1101/228619

DNA sequencing of PCR-amplified marker genes, especially but not limited to the 16S rRNA gene, is perhaps the most common approach for profiling microbial communities. Due to technological constraints of commonly available DNA sequencing, these approaches usually take the form of short reads sequenced from a narrow, targeted variable region, with a corresponding loss of taxonomic resolution relative to the full length marker gene. We use Pacific Biosciences single-molecule, real-time circular consensus sequencing to sequence amplicons spanning the entire length of the 16S rRNA gene. However, this sequencing technology suffers from high sequencing error rate that needs to be addressed in order to take full advantage of the longer sequence. Here, we present a method to model the sequencing error process using a generalized pair hidden Markov chain model and estimate bacterial abundances in microbial samples. We demonstrate, with simulated and real data, that our model and its associated estimation procedure are able to give accurate estimates at the species (or subspecies) level, and is more flexible than existing methods like SImple Non-Bayesian TAXonomy (SINTAX).

Download data

  • Downloaded 1,551 times
  • Download rankings, all-time:
    • Site-wide: 10,921
    • In microbiology: 608
  • Year to date:
    • Site-wide: 39,114
  • Since beginning of last month:
    • Site-wide: 36,567

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)