Oral Presentation Australian Microbial Ecology Conference 2024

Bin Chicken: targeted coassembly recovers 24,000 novel species (#90)

Samuel TN Aroney 1 , Rhys JP Newell 1 , Gene W Tyson 1 , Ben J Woodcroft 1
  1. Queensland University of Technology, Brisbane, Queensland, Australia

Recovery of microbial genomes from metagenomic datasets (as metagenome assembled genomes—MAGs) has provided genomic representation for ~200,000 species from diverse environmental and clinical systems. However, low abundance microorganisms are often missed, with insufficient sequencing depth to enable genome recovery. Coassembly of multiple samples can facilitate recovery of microorganisms by increasing their effective sequencing depth, but for large multi-sample metagenomic datasets, it is unclear which metagenomes to coassemble. The choice can be made through metadata (e.g., all samples from a particular lake), or by comparing samples based on kmer signatures of entire metagenomes. While these methods achieve some success, they are biased toward recovery of dominant microbial MAGs that have already been recovered.

Here we present Bin Chicken, an algorithm which substantially improves genome recovery by coassembling reads pooled from strategically chosen metagenomes. Bin Chicken chooses groups of samples to analyse together based on shared marker gene sequences derived from raw reads. Marker gene sequences that are divergent from known reference genomes can be further prioritised, providing an efficient means of recovering highly novel genomes. Applying Bin Chicken to public metagenomes and coassembling 800 sample-groups recovered 77,562 microbial genomes, including the first genomic representatives of 7 phyla, 50 classes, and 24,028 species. De novo tree inference revealed that addition of novel species increased the known phylogenetic diversity of Bacteria by 12% and Archaea by 18% (with 12% and 22% species growth, respectively). The known phylogenetic diversity of 35 phyla was increased by more than 25%. These genomes expand the genomic tree of life and uncover a wealth of novel microbial lineages for further research.