Rxivist logo

Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling

By Garold Fuks, Michael Elgart, Amnon Amir, Amit Zeisel, Peter J. Turnbaugh, Yoav Soen, Noam Shental

Posted 06 Jun 2017
bioRxiv DOI: 10.1101/146738 (published DOI: 10.1186/s40168-017-0396-x)

Background: Most of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite short, requiring the researcher to choose a subset of the gene to sequence (typically 16-33% of the total length). Thus, many bacteria may share the same amplified region and the resolution of profiling is inherently limited. Platforms that offer ultra long read lengths, whole genome shotgun sequencing approaches, and computational frameworks formerly suggested by us and by others, all allow different ways to circumvent this problem yet suffer various shortcomings. There is need for a simple and low cost 16S rRNA gene based profiling approach that harnesses the short read length to provide a much larger coverage of the gene to allow for high resolution, even in harsh conditions of low bacterial biomass and fragmented DNA. Results: This manuscript suggests Short MUltiple Regions Framework (SMURF), a method to combine sequencing results from different PCR-amplified regions to provide one coherent profiling. The de facto amplicon length is the total length of all amplified regions, thus providing much higher resolution compared to current techniques. Computationally, the method solves a convex optimization problem that allows extremely fast reconstruction and requires only moderate memory. We demonstrate the increase in resolution by in silico simulations and by profiling two mock mixtures and real-world biological samples. Reanalyzing a mock mixture from the Human Microbiome Project achieved about two-fold improvement in resolution when combing two independent regions. Using a custom set of six primer pairs spanning about 1200bp (80%) of the 16S rRNA gene we were able to achieve ~100 fold improvement in resolution compared to a single region, over a mock mixture of common human gut bacterial isolates. Finally, profiling of a Drosophila melanogaster microbiome using the set of six primer pairs provided a ~100 fold increase in resolution, and thus enabling efficient downstream analysis. Conclusions: SMURF enables identification of near full-length 16S rRNA gene sequences in microbial communities, having resolution superior compared to current techniques. It may be applied to standard sample preparation protocols with very little modifications. SMURF also paves the way to high-resolution profiling of low-biomass and fragmented DNA, e.g., in the case of Formalin-fixed and Paraffin-embedded samples, fossil-derived DNA or DNA exposed to other degrading conditions. The approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons, e.g., in Multilocus Sequence Typing (MLST).

Download data

  • Downloaded 983 times
  • Download rankings, all-time:
    • Site-wide: 12,312 out of 92,758
    • In bioinformatics: 1,957 out of 8,685
  • Year to date:
    • Site-wide: 55,330 out of 92,758
  • Since beginning of last month:
    • Site-wide: 47,223 out of 92,758

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)