Journal of Data Science,
Statistics, and Visualisation

Deep dynamic co-clustering of count data streams: application to pharmacovigilance

Authors

  • Giulia Marchello Inria
  • Alexandre Destere Université Côte d’Azur Medical Centre https://orcid.org/0000-0001-6147-9201
  • Marco Corneli Université Côte d’Azur
  • Charles Bouveyron Université Côte d’Azur

DOI:

https://doi.org/10.52933/jdssv.v4i4.106

Keywords:

co-clustering, change-points detection, zero-inflated distribution, online inference, VEM algorithm, data stream

Abstract

Co-clustering is a widely used technique that allows the analysis of complex

and high-dimensional data in various domains. However, existing models

mostly concentrate on continuous and dense data in fixed time situations, where

cluster assignments remain unchanged over time. For example, in the field of

pharmacovigilance, it is crucial to cluster in real time drugs and adverse effects

simultaneously, facilitating the automation of safety signal detection processes.

However, traditional co-clustering methods require all the data to be loaded into

memory, which can be a challenge for large datasets or even impossible in certain

scenarios. The proposed online co-clustering model is designed to overcome this

challenge by processing the data incrementally, one step at a time. This work

introduces a novel inference process for the latent block model that addresses the

challenge of online co-clustering of sparse data matrices. To properly model this

type of data, we assume that the observations follow a time and block dependent

mixture of zero-inflated distributions, thus combining stochastic processes with

the time-varying sparsity modeling. To detect abrupt changes in the dynamics

we make use of a Bayesian online change point detection method on both

cluster memberships and data sparsity estimations. The inference relies on an

original variational procedure whose maximization step trains a LSTM neural

network in order to solve the dynamical systems. Numerical experiments on simulated

datasets demonstrate the effectiveness of the proposed methodology in the

context of count data streams. Then, we fit the model to a large-scale dataset

supplied by the Regional Center of Pharmacovigilance of Nice (France), providing

meaningful online segmentation of drugs and adverse drug reactions.

 

Downloads

Published

2024-08-12

How to Cite

Marchello, G., Destere, A., Corneli, M., & Bouveyron, C. (2024). Deep dynamic co-clustering of count data streams: application to pharmacovigilance. Journal of Data Science, Statistics, and Visualisation, 4(4). https://doi.org/10.52933/jdssv.v4i4.106
Journal of Data Science,
Statistics, and Visualisation
Pages