Two computer science graduates at Rochester Institute of Technology are helping to demystify mathematics with a search engine specifically made for sophisticated math.
David Stalnaker and Nidhin Pattaniyil created the Tangent search engine, a first-of-its-kind tool that allows people to search documents using formulas and text. Experts and non-experts can enter their search using the popular scientific markup language LaTeX, or by drawing the formula.
For searching math, their results are better than Google. At a recent information retrieval competition held in Japan, the Tangent search engine produced the best results for formula search and had the highest percentage of “relevant” hits.
The tool will allow math experts and scientists to easily search for the math in long technical documents using formulas, as well as keywords. The RIT team also hopes that non-experts will someday use Tangent to explain unfamiliar notations and learn more about math.
“For many people, visual elements are the anchor for understanding how to organize things, especially with math,” said Richard Zanibbi, associate professor of computer science in RIT’s B. Thomas Golisano College of Computing and Information Sciences. “We can’t just rely on text-based math, we need an intuitive search engine for visual math.”
Tangent began as a continuation of the min project, a tool created in RIT’s Document and Pattern Recognition Lab that makes it easier to include mathematical expressions in search queries. The min tool allows users to draw math expressions on a canvas, which then converts the expression to text.
“At first, we had the query sent to a chosen search engine—Wolfram Alpha, Google or Wikipedia,” said Zanibbi. “However, we found that these engines were not providing the most relevant results.”
As part of his master’s project, Stalnaker, a 2013 RIT graduate who now works at Google, began creating a better search engine. For formulas, the engine would index how the symbols are visually laid out from left to right. The text retrieval system was then built on top of the popular open source enterprise search platform Solr. The system is capable of indexing many collections, including Wikipedia and part of the arXiv collection of science articles.
“It is much simpler than most people expect,” said Zanibbi. “You don’t have to encode everything.”
Tangent searches for a formula based on its appearance, explains Zanibbi. For example, an expression is more likely to be written x2+1 than 1+x2, so you only need to search for the appropriate expression.
Pattaniyil, a 2014 RIT graduate, extended the project for his master’s thesis. He converted the math index from a memory-based system to a database-based one.
“I also implemented the indexing and retrieval of text and began supporting queries with multiple formulas and text,” said Pattaniyil, who now works for Comcast as a contractor in Philadelphia.
Tangent was tested against seven other search engines at the 11th NII Testbeds and Community for Information access Research (NTCIR) conference held Dec. 9–12 in Japan. The search engine beat the competition when searching through Wikipedia articles and a collection of 100,000 scientific documents. Tangent produced the highest-rated top-five hits for combined text and math queries, with 92 percent of queries being relevant.
Tangent is free to use and a demonstration of the formula retrieval engine can be found at saskatoon.cs.rit.edu/tangent. On average the engine can receive 28,000 relevant results in 2.2 seconds.
“But that is too slow,” Zanibbi said. “We plan to continue scaling our search engine up and make it faster.”
Currently, Zanibbi is working to improve Tangent with Kenny Davila, a Ph.D. student in computing and information sciences, and with collaborators Frank Tompa and Andrew Kane from the University of Waterloo, Canada. Tangent and min are a part of the National Science Foundation-funded project “Combining Algorithms for Recognition and Retrieval of Mathematics.”