Banner

Optimal Discovery of Factors in Relational Data via a Novel Method for Matrix Decomposition

Radim Belohlavek and Vilem Vychodil
SUNY Binghamton

Abstract:

We present a new method of discovery of factors in relational data. Relational data is the kind of data we get when collecting information about people such as what places they visit, what events are they participating at, what schools they attended, what background and knowledge they have, etc. Another example is data describing events and their characteristics, organizations and their characteristics, and the like. Mathematically, such data can be represented by a binary or many-valued n x m matrix describing n objects (e.g., people) and m attributes (e.g., characteristics of people). Although the number m of attributes may be large, there might be only a small number of fundamental factors which can explain the data and of which the attributes are just particular manifestations. The discovery of such fundamental factors is the subject of our talk.

One of the benefits of factors discovery is that it allows us to see fundamental characteristics, instead of the characteristics we observe when collecting information. Another benefit is reduction of dimensionality: Instead of a possibly large number of original characteristics, we can work with a smaller number of fundamental characteristics, i.e. factors.

We present an optimal method of discovery of factors in binary and many-valued data, in that it provides us with the least number of explanatory factors possible. We discuss the computational aspects and present illustrative as well as real-world examples.