Deep dynamic co-clustering of count data streams: application to pharmacovigilance
DOI:
https://doi.org/10.52933/jdssv.v4i4.106Keywords:
co-clustering, change-points detection, zero-inflated distribution, online inference, VEM algorithm, data streamAbstract
Co-clustering is a widely used technique that allows the analysis of complex
and high-dimensional data in various domains. However, existing models
mostly concentrate on continuous and dense data in fixed time situations, where
cluster assignments remain unchanged over time. For example, in the field of
pharmacovigilance, it is crucial to cluster in real time drugs and adverse effects
simultaneously, facilitating the automation of safety signal detection processes.
However, traditional co-clustering methods require all the data to be loaded into
memory, which can be a challenge for large datasets or even impossible in certain
scenarios. The proposed online co-clustering model is designed to overcome this
challenge by processing the data incrementally, one step at a time. This work
introduces a novel inference process for the latent block model that addresses the
challenge of online co-clustering of sparse data matrices. To properly model this
type of data, we assume that the observations follow a time and block dependent
mixture of zero-inflated distributions, thus combining stochastic processes with
the time-varying sparsity modeling. To detect abrupt changes in the dynamics
we make use of a Bayesian online change point detection method on both
cluster memberships and data sparsity estimations. The inference relies on an
original variational procedure whose maximization step trains a LSTM neural
network in order to solve the dynamical systems. Numerical experiments on simulated
datasets demonstrate the effectiveness of the proposed methodology in the
context of count data streams. Then, we fit the model to a large-scale dataset
supplied by the Regional Center of Pharmacovigilance of Nice (France), providing
meaningful online segmentation of drugs and adverse drug reactions.