Optimal Discovery of Factors in Relational Data via a Novel Method for Matrix Decomposition
Radim Belohlavek and Vilem Vychodil
SUNY Binghamton
Abstract:
We present a new method of discovery of factors in relational data.
Relational data is the kind of data we get when collecting information
about people such as what places they visit, what events are they
participating at, what schools they attended, what background and knowledge
they have, etc. Another example is data describing events and their
characteristics, organizations and their characteristics, and the like.
Mathematically, such data can be represented by a binary or many-valued
n x m matrix describing n objects (e.g., people) and m attributes (e.g.,
characteristics of people). Although the number m of attributes may be large,
there might be only a small number of fundamental factors which can explain
the data and of which the attributes are just particular manifestations.
The discovery of such fundamental factors is the subject of our talk.
One of the benefits of factors discovery is that it allows us to see
fundamental characteristics, instead of the characteristics we observe
when collecting information. Another benefit is reduction of dimensionality:
Instead of a possibly large number of original characteristics, we can
work with a smaller number of fundamental characteristics, i.e. factors.
We present an optimal method of discovery of factors in binary and
many-valued data, in that it provides us with the least number
of explanatory factors possible. We discuss the computational
aspects and present illustrative as well as real-world examples.