S3.2 Choice of the Number of Component Clusters in Mixture Models by Information Criteria

Next: S3.3 Contextual and Non-Contextual Up: S3 Image Previous: S3.1 Outline-Based Part Segmentation

S3.2 Choice of the Number of Component Clusters in Mixture Models by Information Criteria

Christian Olivier
Université de Poitiers, France
Frédéric Jouzel
Université de Rouen, France
Abdelaziz El Matouat
École Normale Supérieure de Fès, Morocco

Download PDF file

Abstract: This paper considers the problem of choosing the number of component clusters within the context of the standard mixture of multivariate normal distributions. The problem to choose the number of clusters in a clustering procedure has already been dealt with, but still remains opened. We propose to use information criteria to solve this problem in the Gaussian mixture-model approach, which is nowadays a standard approach in clustering. The different criteria are presented and then compared with other well-known criteria on synthetic data sets. Often, the number of clusters k is unknown and needs to be estimated. A two-stage iterative maximum-likelihood procedure is used as a clustering technique to estimate the parameters of the mixture-model. A new criterion is derived and proposed as a criterion for choosing the number of clusters in the mixture-model context. For comparative purposes, Akaike's information criterion AIC (1973) and Rissanen's 1978 MDL criterion are also introduced in the mixture-model context. Numerical examples are shown on simulated normal data sets with a known number of mixture clusters to illustrate the significance of our criterion in choosing the number of clusters and the best fitting model. We demonstrate its efficiency and robustness through experimental results for synthetic mixture data sets.

Next: S3.3 Contextual and Non-Contextual Up: S3 Image Previous: S3.1 Outline-Based Part Segmentation

Marc Parizeau
5/18/1999