Seminar information

Location: Roma

Date: 10/11/2023, 11:00 - 12:00

Speaker: Fabio Cumbo

Download documentation

Download:

Towards revealing the hidden diversity of the microbial dark matter and its relation to host health and environmental factors 

Metagenomics allows studying not only well-characterized microbes but also a large number of cultivation-recalcitrant microbes. Recently, metagenomics assembly has been applied to reconstruct genomes from metagenomes which can then be analyzed with techniques commonly used for whole-genome sequenced isolates. Metagenome-assembled genomes (MAGs) have indeed paved the way for comparative genomics studies of cultivable and uncultivable microbes at strain level resolution, including yet-to-be-characterised ones. Together with the growing number of publicly available metagenomic datasets, this is an unprecedented opportunity to de novo assemble a large catalog of microbial genomes. However, a systematic procedure for organizing and processing hundreds of thousands of MAGs together with all the genomes obtained by isolate sequencing and metadata describing their relation to host health and environmental factors is currently lacking. To address this problem, we developed MetaSBT, a scalable framework for automatically organizing microbial genomes from isolate sequencing and MAGs into sequence-consistent clusters at different taxonomic levels with Sequence Bloom Trees, a data structure able to efficiently index and search over massive amounts of sequences. In particular, the framework is able to (i) index thousands of microbial genomes at different taxonomic levels, (ii)dynamically define the boundaries of every cluster of genomes as the minimum and maximum number of kmers in common between all the genomes under a specific cluster, (iii) update the database with new reference genomes and MAGs to expand the genetic diversity and define new clusters genetically distant to every reference genome in the database according to the clusters boundaries, defining totally new microbial species that should be prioritized for further investigation of the microbial dark matter. Crucially, this framework has the potential to guide most of the metagenomic profiling techniques from other research groups and could be used as a curated set of high-quality MAGs and corresponding metadata. This would allow discovering new microbial species across studies, environments, and hosts. As a base for metagenomics profilers, it would represent the first step toward more accurate updates of most of the meta-analyses performed on metagenomic samples related to a wide range of pathological conditions. Re-analysing public studies with this framework and database has the potential to uncover new insights about microbial species previously undetected in metagenomic samples and potentially relevant for understanding their relation to the origin and progression of specific diseases.