Computer Science Colloquium: Reexamining artificial intelligence methods to predict missing connections in knowledge graphs

Speaker: Carlos Rivero

Abstract: A knowledge graph describes entities of interest and their connections. Knowledge graphs are at the core of many applications, such as search engines (Google, Bing), virtual assistants (Alexa, Siri), social networks (LinkedIn, Facebook), and product catalogs (Amazon, eBay). Unfortunately, knowledge graphs are typically far from complete due to their unsupervised construction, that is, there are missing connections between entities, which hinders their effectiveness in practice. For example, questions to a virtual assistant often cannot be answered completely or accurately.

Artificial intelligence methods aim to predict missing connections by training a model that takes examples of correct (positive) and incorrect (negative) connections as input. A completion model determines whether a new connection is positive and should be added to the graph, or incorrect and should not be added. Even though many researchers of artificial intelligence methods have reported satisfactory results, recent work suggests that these results were achieved under unrealistic conditions (datasets that contain high data redundancy), and that the accuracy of completion models is actually quite poor. Where do we stand? Is the accuracy achieved by these models in a couple of benchmarking datasets enough to accept or discard proposed artificial intelligence methods?

In this talk, I will introduce the evaluation protocol currently used to measure the accuracy of completion models. We observe several shortcomings: 1) Metrics borrowed from the information retrieval field are not suitable to evaluate the accuracy of completion models. 2) Benchmarking datasets contain anomalies (data redundancy) that are currently not integrated in the protocol. 3) Benchmarking datasets have been split randomly, which alters the graph topology and results in the training split not resembling the original graph. Our contributions are as follows: 1) A new metric that is appropriate for our context. 2) An anomaly coefficient that is integrated in the protocol. 3) A downscaling algorithm to generate training splits that preserves graph topology with statistical guarantees. Our experiments over three well-known datasets show that traditional methods (TransD, TransE and TransH) significantly and consistently outperform recent methods. Our results entail that the understanding of the accuracy of completion models is far from perfect and we call for a complete reexamination of the methods in this area. Joint work with Iti Bansal and Sudhanshu Tiwari . Bio: I am an assistant professor in CS@RIT where I mainly teach database courses. My research interests are graphs and their applications (GOAL Lab: https://www.cs.rit.edu/~crr/goal). I received my PhD from the University of Seville, Spain, where I worked on knowledge translation: automated generation of mappings to exchange knowledge between heterogeneous knowledge graphs. My advisees and I are currently examining the reproducibility, replicability and explainability of knowledge graph completion models.


Contact
Jordan Gates
Event Snapshot
When and Where
October 29, 2020
12:30 pm - 1:30 pm
Room/Location: Zoom
Who

Open to the Public

Interpreter Requested?

No