COLA Connections Newsletter: November 2014

Dr. Dan Roth’s Research in Computational Linguistics Could Define the Future

Language is fascinatingly human; the way that our minds formulate linguistic patterns and make sense of our thoughts in order to contextualize and express them is beyond complex. How could we possibly teach non-human entities how to effectively understand language? The intricacy, ambiguity, and variability of naturally arising language make it exceedingly difficult for computational systems to understand and make sense of. This notion lies at the center of the field of study known as computational linguistics, at which the renowned Dr. Dan Roth lies at the head.

RIT was honored to receive Dr. Roth who delivered the Distinguished Computational Linguistics Lecture, a joint effort between the College of Liberal Arts and the Golisano College. The event, held October 23rd, 2014 in the Golisano Auditorium, was enthusiastically received by a packed house of students. Dr. Roth, currently a professor at the University of Illinois at Urbana-Champaign, is a leading figure in this domain, and he has published countless works of original thought that define and refine the field. His passion for machine learning and natural language processing has led to many tools that are used commercially and throughout the research community. His research will be felt inside this field and applied in countless ways outside of it for generations to come.

To contextualize Dr. Roth’s lecture, many exciting expansions are happening in this area in the College of Liberal Arts, in which curriculum in human language technology and computational linguistics is offered, alongside language science. For example, two new professors were brought on this year; Dr. Emily Prud’hommeaux and Dr. Zhong Chen join Dr. Cecilia Ovesdotter Alm in conducting state-of-the-art research in computational linguistics. The expanding curricular offerings are strongly interdisciplinary, combining a technical and computational side with a linguistics side.

Computational linguistics goes hand-in-hand with artificial intelligence. Machine translation and other language technologies provide opportunities to communicate effectively using machines, including apps on cell phones and other mobile devices, or equipment that takes advantage of linguistic sensing. These types of concepts were illustrated by Dr. Roth’s lecture, which attracted a large audience including and going beyond RIT’s computational linguistics community. The field involves mining language and text, and intelligent systems making sense of human communication. The goal? To understand language, something that is fundamentally human, in order to automate this process and make use of it in a variety of fields and useful applications.

When asked why natural language processing is such an opportunity, Dr. Roth asserted that it is scientifically interesting and challenging as to why humans can do it so easy. We understand language very quickly and naturally, but it is difficult to automate. This type of challenge is what fuels research and education. It is an invitation; what society could gain from this work is limitless.

One of the chief applications of this field is in the medical domain. Dr. Roth mentioned that roughly a million medical documents are published each year. There currently isn’t a way to fathom, contextualize, supervise, or make practical sense of the data held within these by a single entity. Humans can make sense of this type of data but machines cannot fully, and the sheer amount of information is too much for humans themselves to take on. Machine learning seeks to use the computational power of machines and combine it with the way that humans are able to understand data to serve greater purposes. A machine would be able to use a mass amount of information, such as in medical papers, to limitless possibilities.

Dr. Roth continued to explain the current limitations in that when you search through Google, you are using a very low bandwidth of communication. The input and results are keyword-based. You often can’t ask specific questions and get specific answers. There is much information that you can’t access; it is there, but machines don’t know the right way to access it. “The ability to access information would be significantly different if you could communicate better with natural language [in human-computer interactions],” Dr. Roth said.

Machine learning is all over the internet, whether you realize it or not. Google, Facebook, and Twitter are just examples of the countless that utilize it. Another major example of its current usage is in advertising. Companies look at previous communications, such as emails or purchases, and automatically decide which ad would be most effective. Yet another example is the financial domain, where companies can detect fraud by automatically scanning credit card transactions and determining if they are legit. Language translators in general even use components from this field.

Learning and understanding from language is something that makes us innately human, and replicating that is the challenge because of ambiguity and variability in language. Every decision when receiving information comes with an additional layer of knowledge that does not come with the data. Take sign language interpretation for example, where the interpreter has to make very rapid decisions all the time. The interpreters bring the human layer of experience to contextualize the information they are receiving and translate it to a new audience. We reason with knowledge. It is easy to build a database, but what we need is machines that are able to make good use of this data with reason.

“Being able to process natural language automatically is going to be very very useful for society,” Dr. Roth said, but there is still much work to be done. Back to the medical industry; what if every time someone went to the emergency room, machines were able to meaningfully combine all of that person’s many medical records into one simple, relevant document that better prepared the doctor? Doctors cannot possibly dig through every bit of medical data on a person every time they come in, so something that streamlines the process intelligently could improve the lives of many. In the future, we will be able to communicate more freely with all of the machines around us.

Dr. Roth said he was “very impressed with what he has seen at RIT.” The interdisciplinary nature of the institute drives many interesting projects, from collaboration with medical data to the social science perspective of the work. The Digital Humanities program under development is just one example of blending Liberal Arts programs with computing technologies.

Overall, the most impressive part of RIT to Dr. Roth was how the work of the students seems to look outward, to what society can gain from these studies. Young people should be interested in topics like computational linguistics not only because they are challenging, but because they are an investment in the future.