Bridging the inference gap in multimodal variational autoencoders

Agathe Senellart; Stéphanie Allassonnière

doi:10.52933/jdssv.v5i9.160

Bridging the inference gap in multimodal variational autoencoders

Authors

Agathe Senellart Université Paris-Cité https://orcid.org/0009-0000-3176-6461
Stéphanie Allassonnière https://orcid.org/0000-0002-5692-4945

DOI:

https://doi.org/10.52933/jdssv.v5i9.160

Keywords:

Contrastive learning, multimodality, normalizing flows, variational autoencoders.

Abstract

From medical diagnosis to autonomous vehicles, many critical applications
rely on the integration of multiple heterogeneous data modalities. Multimodal
Variational Autoencoders offer versatile and scalable methods for generating un-
observed modalities from observed ones. Recent models using mixture-of-experts
aggregation suffer from theoretical limitations that reduce generation quality on
complex datasets. In this article, we propose a novel interpretable model able
to learn both joint and conditional distributions without introducing mixture
aggregation. Our model follows a multistage training process: after learning
the joint distribution with variational inference, we learn the conditional dis-
tributions using normalizing flows and a new, theoretically grounded objective
function. Importantly, we also propose extracting the semantic content shared be-
tween modalities in a pre-training stage and incorporating these representations
into the inference distributions to enhance generative coherence. Our method
achieves state-of-the-art results on several benchmark datasets.

Bridging the inference gap in multimodal variational autoencoders

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License