Rxivist logo

Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT

By F.A. Bastiaan von Meijenfeldt, Ksenia Arkhipova, Diego D. Cambuy, Felipe H. Coutinho, Bas E. Dutilh

Posted 24 Jan 2019
bioRxiv DOI: 10.1101/530188 (published DOI: 10.1186/s13059-019-1817-x)

Current-day metagenomics increasingly requires taxonomic classification of long DNA sequences and metagenome-assembled genomes (MAGs) of unknown microorganisms. We show that the standard best-hit approach often leads to classifications that are too specific. We present tools to classify high-quality metagenomic contigs (Contig Annotation Tool, CAT) and MAGs (Bin Annotation Tool, BAT) and thoroughly benchmark them with simulated metagenomic sequences that are classified against a reference database where related sequences are increasingly removed, thereby simulating increasingly unknown queries. We find that the query sequences are correctly classified at low taxonomic ranks if closely related organisms are present in the reference database, while classifications are made higher in the taxonomy when closely related organisms are absent, thus avoiding spurious classification specificity. In a real-world challenge, we apply BAT to over 900 MAGs from a recent rumen metagenomics study and classified 97% consistently with prior phylogeny-based classifications, but in a fully automated fashion.

Download data

  • Downloaded 1,062 times
  • Download rankings, all-time:
    • Site-wide: 15,529 out of 117,931
    • In bioinformatics: 1,967 out of 9,553
  • Year to date:
    • Site-wide: 29,724 out of 117,931
  • Since beginning of last month:
    • Site-wide: 35,986 out of 117,931

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)