Simultaneous t-Model-Based Clustering for Time Dependent Data: Application to a Study of the Financial Health of Corporations
AbstractStudent's t mixture model-based clustering is often used as a robust alternative to the Gaussian model-based clustering. In this paper, we aim to cluster several different datasets at the same time, instead of a single one as is common, in a context where underlying t-populations are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a GEM algorithm that we describe. We then propose to apply these models to two financial company time-dependent data sets, consisting of both healthy and bankrupt companies. Our new models point out that the hidden structure could be more complex than generally expected, distinguishing three groups: not only two clear groups of healthy and bankrupt companies but also a third one representing companies with unpredictable health.