Associate professor studying speech signals

Faculty’s affinity for language and mathematics is the basis for further developing speech recognition technologies

Ernest Fokoue, associate professor of statistics in RIT’s John D. Hromi Center for Quality and Applied Statistics, is analyzing audio and speech signals that will enable better voice recognition technology.

Can a person erase traces of his original language and dialect? Or is a voice as unique as a fingerprint?

Researchers at Rochester Institute of Technology, looking to answer those questions, have started to analyze audio and speech signals that will enable better voice recognition technology. Applications for the new research can be used in speech recognition software development, security recognition, text mining and computer-assisted language learning.

Ernest Fokoue, associate professor of statistics in RIT’s John D. Hromi Center for Quality and Applied Statistics, began work on acquiring multiple unique “voices” for his project “Statistical Analysis of Audio and Speech Signals.” His work is part of an evolving field of speech processing, and it is expected to improve the technology behind language processing systems, a field combining linguistics and computational techniques where individual voices can be recognized for not only context, but specific characteristics such as cultural dialect.

“I’m going to use mathematics and statistical signal processing to emulate a linguist,” he says. “I want to be able to recognize, in the most refined way, the subtle differences between people. If I can characterize completely the signal or voice signature, it would be almost like having your voice as a fingerprint.”

He is currently analyzing data collected from more than 150 subjects at RIT who “voiced” five sentences of varying emotional content. The current data for each subject amounted to more than 1.5 million pieces of information, including variations of individual voices.

The data collected in projects such as Fokoue’s are considered “big data,” a term used to refer to the complex and high volume of information analyzed for consumer market trending, business solutions and government and military security analyses. He is building a data matrix for distinguishing characteristics being measured, technology that will have the highest accuracy in recognizing voices to detect a person’s dialect using math.

“I’m hoping with mathematics I can go down to more subtle things than what the linguist is seeing,” says Fokoue. “Part of this for me is to better understand languages’ classifications, recognition. I want to be able to recognize in the most refined way the subtle differences between people. And if you recognize the components, can you actually learn to erase that?”

While he is not an ethnographer, a researcher of cultural phenomena, Fokoue has an affinity for languages, speaking seven fluently—English, French, Spanish, Italian, German, Russian and Fokoue, pronounced fol-kway, his native language of Cameroon.

Recognizing language characteristics is like distinguishing between the voices of siblings, he says. “I have six brothers and we sound similar but not alike. I want to isolate the commonalities, and go down to the differences; there must be a way to recognize that on a computer.”

Note: Fokoue will continue to collect voice samples through the remainder of the academic year. He can be contacted at epfeqa@rit.edu for more information about his project and to volunteer as part of the research data collection.

Topics


Recommended News