Solving the relationship between microorganisms and people with deep learning and Bayesian methods.

Disclaimer: machine translated by DeepL which may contain errors.

The Rigakubu News
The Rigakubu News, Nov. 2025.

Research Student Communicates to Faculty >

Tung Dang (Special Researcher, Department of Biological Sciences), Tatsuhiko Tsunoda (Professor, Department of Biological Sciences)

A vast number of bacteria inhabit the human body. It is said that the number of bacteria in the gut reaches about 100 trillion—far exceeding the approximately 30 to 40 trillion cells that make up the human body.
If we could skillfully control such bacteria, we might be able to cure certain diseases. However, analyzing the effects that bacteria have on the human body is far from simple.
This is because there are thousands of bacterial species, while the metabolites in the human body that they could potentially influence number in the hundreds. The possible interactions formed by the combinations of these are enormous.
How can we analyze them?

In recent years, it has become clear that microorganisms such as gut bacteria play a major role in human health. Microbes produce chemical compounds known as metabolites, which influence virtually every aspect of the human body—including the immune system, metabolism, and even brain function. Both microbes and their metabolites are extremely diverse, and they interact in complex ways with the many metabolites present in the human body. Traditionally, it has been difficult to predict how microbes affect human metabolites or to identify which bacterial species are key players in these interactions. The challenge lies in extracting only the meaningful relationships from an astronomical number of possible combinations.

Conventional analytical methods often ignore uncertainty and may lead to overly confident conclusions, such as “Bacterium X definitely affects metabolite Y.” If such conclusions are incorrect, subsequent research can be misguided or wasted. To overcome these issues, advanced probabilistic reasoning is required. However, such methods tend to be computationally expensive, meaning that practical application demands approaches that can drastically reduce computation time.

To solve these problems, we developed a new analytical framework called Variational Bayesian Microbiome Multi-Omics (VBayesMM) (Figure). This method combines a deep-learning approach that compresses information with Bayesian inference, which quantifies the certainty or uncertainty of the inferred relationships. In doing so, it enables the extraction of truly important microbe–metabolite interactions. VBayesMM has three key features.

First, by incorporating a specialized probability distribution—known as a spike-and-slab distribution^*1—into its deep-learning component, the model can clearly distinguish the microbes that have strong effects on human metabolites from those that do not.

Second, Bayesian inference allows the method to quantitatively capture uncertainty in the data, thereby enabling uncertainty-aware predictions and increasing the reliability of the analysis.

Third, by adopting variational inference ^*2, a fast computational technique, VBayesMM can perform large-scale analyses within a practical timeframe.

We evaluated the predictive performance of VBayesMM using datasets from studies on sleep disorders, obesity, and cancer. In all cases, VBayesMM substantially outperformed conventional analytical methods and successfully identified groups of bacteria associated with each disease.

Research aimed at elucidating the effects of microbes—such as gut bacteria—on the human body is still in its early stages. Through this study, we aim to uncover relationships between microbes and human metabolites and to elucidate new connections between microbes and disease, along with the mechanisms underlying them. One essential aspect of such research is acknowledging uncertainty in the analysis and clearly communicating the degree of confidence in the results. Based on the insights gained through methods like VBayesMM, it is expected that we will be able to identify health- or disease-associated bacteria, control them in a personalized manner (personalized medicine and prevention), and even develop new therapeutics (drug discovery).

The results of this research were published as T. Dang et al., Briefings in Bioinformatics 26, bbaf300 (2025).

Venus captured in images taken by the Himawari weather satellite

*1: A statistical method that automatically selects only the truly important elements from large amounts of data *2: A method that approximates the posterior distribution (the probability distribution of parameter values updated by obtaining data), which is computationally difficult in Bayesian inference, using a simpler distribution

(Press release dated July 4, 2025)