MACs: Multi-Attribute Co-Clusters with High Correlation Information
Accepted by the
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2009), Bled, Slovenia, September 7-11, 2009.
Authors
- Kelvin Sim - shsim{at}a-star.i2r.edu.sg - Institute for Infocomm Research, A*STAR, Singapore
- Vivekanand Gopalkrishnan - asvivek{at}ntu.edu.sg
- Hon Nian Chua - hnchua{at}a-star.i2r.edu.sg - Institute for Infocomm Research, A*STAR, Singapore
- See-Kiong Ng - skng{at}a-star.i2r.edu.sg - Institute for Infocomm Research, A*STAR, Singapore
Abstract
In many real-world applications that analyze correlations between
two groups of diverse entities, each group of entities can be characterized
by multiple attributes. As such, there is a need to co-cluster multiple
attributes’ values into pairs of highly correlated clusters. We denote
this co-clustering problem as the multi-attribute co-clustering problem.
In this paper, we introduce a generalization of the mutual information
between two attributes into mutual information between two attribute
sets. The generalized formula enables us to use correlation information
to discover multi-attribute co-clusters (MACs). We develop a novel algorithm
MACminer to mine MACs with high correlation information from
datasets. We demonstrate the mining efficiency of MACminer in datasets
with multiple attributes, and show that MACs with high correlation information
have higher classification and predictive power, as compared
to MACs generated by alternative high-dimensional data clustering and
pattern mining techniques.
Datasets Used