Many bacterial-linked illnesses, such as inflammatory bowel disease or colorectal cancer, are associated with an overgrowth of gut bacteria thought to be bad actors. But when researchers used a machine learning algorithm to predict the density of microbes — called microbial load, from their gut microbiomes, they found that changes in microbial load, rather than the disease, could be a driver behind the presence of disease-associated microbial species.
The researchers report November 13, 2024, in the Cell Press journal Cell that differences in a patient’s microbial load, which was found to be influenced by factors ranging from age, sex, diet, country-of-origin, and antibiotic use, was a key factor for bacterial signatures in fecal samples, even in disease patients.
“We were surprised to find that many microbial species, previously believed to be associated with disease, were more strongly explained by changes in microbial load,” says Peer Bork of the European Molecular Biology Laboratory (EMBL) Heidelberg, one of the senior authors on the study. “This indicates that these species are mainly associated with symptoms like diarrhea and constipation, rather than being directly linked to the disease conditions themselves.”
Microbial load has long been recognized as an important factor in microbiome research, but large-scale analysis has been largely limited due to the high cost and labor-intensive nature of experimental methods, which the investigators overcame with a machine-learning approach. They developed a prediction model for fecal microbial load based on the relative microbiome composition and applied it to a large-scale metagenomic dataset to explore its variation in health and disease.
“Measuring microbial load in fecal samples takes a lot of effort and we were glad to have access to two large metagenomic datasets where the microbial load had been experimentally measured,” says Michael Kuhn, also of EMBL and the other senior author on the study. “With our approach, we want to generalize these data for the benefit of the larger field and with the tools we provide, microbial load can be predicted for all adult human gut microbiome studies.”
The datasets the team generated for the research are thousands of metagenomes and experimentally measured microbial load in the EU-funded GALAXY (Gut-and-Liver Axis in Alcoholic Liver Fibrosis) and the Novo Nordisk Foundation’s MicrobLiver projects. They also used metagenomes and microbial load data from a previously public MetaCardis study population. For exploratory datasets, they used tens of thousands of metagenomes from previous studies including populations from Japan and Estonia.
The team acknowledges limitations to the work. Because the analysis was based only on associations, they were not able to establish a clear direction of causality, nor could they provide mechanistic insight. Additionally, the method developed only applies to the human gut microbiome: Different training datasets are needed to predict the microbial load in other habitats.
Future research will focus on microbial species that are more directly associated with diseases, independent of microbial load, to better understand their roles in disease etiology and their potential use as biomarkers. Additionally, adapting this prediction model to other environments, such as ocean and soil microbiomes, could provide further insights into microbial ecology on a global scale.