Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
Background: Most of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite short, requiring the researcher to choose a subset of the gene to sequence (typically 16-33% of the total length). Thus, many bacteria may share the same amplified region and the resolution of profiling is inherently limited. Platforms that offer ultra long read lengths, whole genome shotgun sequencing approaches, and computational frameworks formerly suggested by us and by others, all allow different ways to circumvent this problem yet suffer various shortcomings. There is need for a simple and low cost 16S rRNA gene based profiling approach that harnesses the short read length to provide a much larger coverage of the gene to allow for high resolution, even in harsh conditions of low bacterial biomass and fragmented DNA. Results: This manuscript suggests Short MUltiple Regions Framework (SMURF), a method to combine sequencing results from different PCR-amplified regions to provide one coherent profiling. The de facto amplicon length is the total length of all amplified regions, thus providing much higher resolution compared to current techniques. Computationally, the method solves a convex optimization problem that allows extremely fast reconstruction and requires only moderate memory. We demonstrate the increase in resolution by in silico simulations and by profiling two mock mixtures and real-world biological samples. Reanalyzing a mock mixture from the Human Microbiome Project achieved about two-fold improvement in resolution when combing two independent regions. Using a custom set of six primer pairs spanning about 1200bp (80%) of the 16S rRNA gene we were able to achieve ~100 fold improvement in resolution compared to a single region, over a mock mixture of common human gut bacterial isolates. Finally, profiling of a Drosophila melanogaster microbiome using the set of six primer pairs provided a ~100 fold increase in resolution, and thus enabling efficient downstream analysis. Conclusions: SMURF enables identification of near full-length 16S rRNA gene sequences in microbial communities, having resolution superior compared to current techniques. It may be applied to standard sample preparation protocols with very little modifications. SMURF also paves the way to high-resolution profiling of low-biomass and fragmented DNA, e.g., in the case of Formalin-fixed and Paraffin-embedded samples, fossil-derived DNA or DNA exposed to other degrading conditions. The approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons, e.g., in Multilocus Sequence Typing (MLST).
- Downloaded 983 times
- Download rankings, all-time:
- Site-wide: 12,312 out of 92,758
- In bioinformatics: 1,957 out of 8,685
- Year to date:
- Site-wide: 55,330 out of 92,758
- Since beginning of last month:
- Site-wide: 47,223 out of 92,758
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!