Rxivist logo

Metagenomics Strain Resolution on Assembly Graphs

By Christopher Quince, Sergey Nurk, Sebastien Raguideau, Robert James, Orkun S Soyer, J. Kimberly Summers, Antoine Limasset, A. Murat Eren, Rayan Chikhi, Aaron E. Darling

Posted 07 Sep 2020
bioRxiv DOI: 10.1101/2020.09.06.284828

We introduce a novel bioinformatics pipeline, STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo , when multiple metagenome samples from the same community are available. STRONG performs coassembly, followed by binning into metagenome assembled genomes (MAGs), but uniquely it stores the coassembly graph prior to simplification of variants. This enables the subgraphs for individual single-copy core genes (SCGs) in each MAG to be extracted. It can then thread back reads from the samples to compute per sample coverages for the unitigs in these graphs. These graphs and their unitig coverages are then used in a Bayesian algorithm, BayesPaths, that determines the number of strains present, their sequences or haplotypes on the SCGs and their abundances in each of the samples. Our approach both avoids the ambiguities of read mapping and allows more of the information on co-occurrence of variants in reads to be utilised than if variants were treated independently, whilst at the same time exploiting the correlation of variants across samples that occurs when they are linked in the same strain. We compare STRONG to the current state of the art on synthetic communities and demonstrate that we can recover more strains, more accurately, and with a realistic estimate of uncertainty deriving from the variational Bayesian algorithm employed for the strain resolution. On a real anaerobic digestor time series we obtained strain-resolved SCGs for over 300 MAGs that for abundant community members match those observed from long Nanopore reads. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 634 times
  • Download rankings, all-time:
    • Site-wide: 27,724 out of 105,462
    • In bioinformatics: 3,729 out of 9,474
  • Year to date:
    • Site-wide: 5,821 out of 105,462
  • Since beginning of last month:
    • Site-wide: 2,349 out of 105,462

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News