April 28, 2026

Mapping the microbial communities beneath our feet and inside our guts

Mapping the microbial communities beneath our feet and inside our guts

MGnify Genomes offers new ways to explore microbial communities in soil, marine sediments, and the gut

Microbial communities are akin to bustling cities filled with millions of microbes. Credit: Karen Arnott/EMBL-EBI

If you could shrink yourself to the size of a microbe, you would discover that every handful of soil and every bucket of seawater is akin to a bustling city filled with millions of microbes. Until recently, most of these ‘microbial metropolises’ were impossible to map because the majority of microbes can not be cultured in the lab.

This makes it extremely challenging for researchers to reconstruct microbial communities living in specific environments such as soil, water, or the gut. The rise of metagenomic data and advances like MGnify Genomes are shedding light on microbial communities that were previously inaccessible.

What is metagenomics?

Metagenomics is the study of collective genetic material, also known as the metagenome, sampled from specific environments.

EMBL-EBI’s MGnify is one of the world’s largest resources for microbiome data analysis. It also contains richly annotated, microbe-derived genomes, a treasure trove for scientists working in antimicrobial resistance, agritech, or biodiversity.

Lorna Richardson (LR), Coordinator for Microbiome Informatics, and Tatiana Gurbich (TG), Microbial Genomes Project Lead, both part of the MGnify team at EMBL-EBI, share their insights on how metagenomics is helping us understand microbes as individuals and as part of their environments.

Can you tell us about the biome catalogues available in MGnify?

TG: The biome catalogues in MGnify are collections of microbial genomes compiled to represent the known microbes in specific environments, such as ocean sediment, soils, or the digestive systems of humans and animals.

The catalogues include genomes assembled from metagenomic samples, as well as genomes from microbes grown and sequenced individually, which we refer to as isolates. Catalogues also contain information about predicted proteins and genes, along with descriptions of their likely functions, for example, involvement in antimicrobial resistance (AMR).

For species represented by multiple genomes, we generate pangenomes, which can be a useful resource for understanding gene prevalence and diversity within a species.

Why are biome catalogues interesting?

LR: Many microbes found in metagenomic samples are hard to grow in the lab. As a result, they may not be represented in traditional genomic databases such as EMBL-EBI’s Ensembl. As metagenomics enables us to reconstruct microbial genomes directly from environmental DNA, it means we can build biome catalogues that better represent the diversity of microbes found in metagenomic samples.

In nature, microbes live in communities. Scientists can gain new insights by analysing them in isolation as well as within their community. Metagenomics and the MGnify biome catalogues help scientists infer which microbes show up in a sample, how abundant they are, which species coexist, and what the individuals can do.

What do MGnify biome catalogues enable scientists to do?

TG: The MGnify biome catalogues can act as reference datasets. If you have a soil sample that you want to contextualise, you can compare it to relevant MGnify genome catalogues. For example, you can use catalogues to identify what taxa are contained in your metagenomic sample and which gene functional categories are represented. Or, if you’re generating your own metagenome-assembled genomes (MAGs), you can use catalogues to check if you have anything novel. If the genome has already been assembled before, MGnify catalogues can show you what other environments the species has been seen in. All catalogue data can be downloaded for further exploration.

Who uses the MGnify biome catalogues?

TG: Last year, over 12,000 people from around the world accessed the Genomes section of MGnify. This is a conservative estimate. Some catalogues are of particular interest for agricultural research, where public data are quite scarce.

What biome catalogues are currently available?

LR: The resource is steadily growing. Right now, MGnify represents 18 biomes and over half a million genomes in total. We have several catalogues from human-associated biomes, the largest and most widely used being the human gut catalogue, which contains nearly 300,000 genomes.

We have genomes assembled from rhizosphere soil samples for tomato, corn and barley cultures, which were generated by the Horizon2020 FindingPheno project. The goal of the project is to understand what drives desirable crop traits and use the information to improve crops. The data generated by the project is openly available.

We also have marine water and marine sediment catalogues developed for the BlueRemediomics project, which harnesses marine microbes to develop high-value, sustainable products and services. There are also several animal-associated catalogues, such as for the pig gut and the cow rumen. We even have a honeybee gut catalogue.

How do you decide what biomes to create catalogues for?

LR: It depends entirely on what data are available. Scientists usually publish the raw genetic data from their studies in public databases such as EMBL-EBI’s European Nucleotide Archive. The MGnify team can then use these raw data, as well as any MAGs shared by scientists, to generate a biome catalogue. Often this is done as part of a project EMBL is involved in – a recent example is the HoloFood project, which explored animal gut microbiomes in farmed animals.

Scientists all over the world are generating MAGs at scale. We always encourage researchers to make data publicly available so the catalogues can best represent all this knowledge, and so the community can benefit from it.

We don’t create biome catalogues as a standard request-based service yet, but anyone who has an interesting MAG dataset can reach out to us by using the MGnify support form to see if we can turn it into a catalogue.

How do you develop biome catalogues?

TG: We start with sequencing data from metagenomic samples of an environment, such as the soil around corn roots, known as the maize rhizosphere, or with genomes that have already been generated from that environment. In the case of raw metagenomic data, we assemble the sequences and generate MAGs.

We process the genomes using a pipeline that performs quality control, clusters the genomes at the species level, selects the best genome as the species representative, and annotates it with functional information about the genes and proteins.

We also gather all proteins and all genes into biome-specific protein and gene catalogues. For each genome in the catalogue, we provide extensive metadata, such as the sample the sequences were taken from, the geographic location, and the quality of the assembly.

How do you update the catalogues to include the latest information?

TG: We have a workflow for adding or updating genomes as new data becomes available. Over time, the biome catalogues will continue to grow as scientists generate more complete genomes, helping us build an ever clearer map of the microbial life in environments that matter to science and society.

Through our biome catalogues, the ‘microscopic cities’ beneath our feet and within our bodies start to take shape. With each new catalogue, we move closer to understanding the microbes and the remarkable environments they create. This isn’t just science for curiosity’s sake, it’s also essential for understanding and treating disease and developing sustainable products and services.

If you have published a dataset that could potentially become or contribute to a MGnify biome catalogue, please contact the team using the MGnify support form.

Our latest News

discover more
Molecular signature linked to aggressive growth of head and neck tumors discovered

Molecular signature linked to aggressive growth of head and neck tumors discovered

Researchers from Heidelberg University’s Faculty of Medicine and the Technical University of Munich have investigated the genetic activity of “budding” head and neck carcinomas. In the “buds”—clusters of cells that detach from the tumor and are associated with a poor prognosis—they found a characteristic pattern of gene activity. This marker signature provides insights into the […]

BioMed X and Servier Launch First Research Team in France to Pioneer AI-Driven Antibody Engineering

BioMed X and Servier Launch First Research Team in France to Pioneer AI-Driven Antibody Engineering

BioMed X, a leading innovation hub for pharma, has officially launched its first research team at BioMed X France, marking the establishment of its presence in one of Europe’s fastest-growing life science ecosystems. The new team, Team ADB (AI-Empowered Design of Bispecific Antibodies), is based within the XSeed Labs incubator on the R&D campus of Servier in […]

Mapping the microbial communities beneath our feet and inside our guts

Mapping the microbial communities beneath our feet and inside our guts

MGnify Genomes offers new ways to explore microbial communities in soil, marine sediments, and the gut If you could shrink yourself to the size of a microbe, you would discover that every handful of soil and every bucket of seawater is akin to a bustling city filled with millions of microbes. Until recently, most of […]

GET IN TOUCH

Stay Updated with bioRN’s Newsletter

Sign up for our newsletter to discover more!
* required

BioRN (BioRN Network e.V. and BioRN Cluster Management GmbH) will use the information you provide on this form to be in touch with you and to provide updates and marketing. Please let us know all the ways you would like to hear from us:

You can update your subscription preferences or unsubscribe at any time. Just follow the unsubscribe or update link in the footer of automated emails you receive from us, or by contacting us at info@biorn.org. We will treat your information with respect. For more information about our privacy practices please visit our website: www.biorn.org. By clicking below, you agree that we may process your information in accordance with these terms.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.

Intuit Mailchimp