Journal of Data Science,
Statistics, and Visualisation

Bridging the inference gap in multimodal variational autoencoders

Authors

DOI:

https://doi.org/10.52933/jdssv.v5i9.160

Keywords:

Contrastive learning, multimodality, normalizing flows, variational autoencoders.

Abstract

From medical diagnosis to autonomous vehicles, many critical applications
rely on the integration of multiple heterogeneous data modalities. Multimodal
Variational Autoencoders offer versatile and scalable methods for generating un-
observed modalities from observed ones. Recent models using mixture-of-experts
aggregation suffer from theoretical limitations that reduce generation quality on
complex datasets. In this article, we propose a novel interpretable model able
to learn both joint and conditional distributions without introducing mixture
aggregation. Our model follows a multistage training process: after learning
the joint distribution with variational inference, we learn the conditional dis-
tributions using normalizing flows and a new, theoretically grounded objective
function. Importantly, we also propose extracting the semantic content shared be-
tween modalities in a pre-training stage and incorporating these representations
into the inference distributions to enhance generative coherence. Our method
achieves state-of-the-art results on several benchmark datasets.

Downloads

Published

2025-11-13

How to Cite

Senellart, A., & Allassonnière, S. (2025). Bridging the inference gap in multimodal variational autoencoders. Journal of Data Science, Statistics, and Visualisation, 5(9). https://doi.org/10.52933/jdssv.v5i9.160

Issue

Section

Data Science, Classification, Statistical Learning, and Multidimensional Data Visualisation
Journal of Data Science,
Statistics, and Visualisation
Pages