Rochester Institute of Technology researchers are taking on Big Data to reduce the negative impact of faulty measurements in data collected from complex sensing systems.
"Based on tremendous developments in hardware, and we can now collect, store and process very large amounts of data across many different sensing modalities. However, for these data to be useful, we have to learn how to process them efficiently and reliably, to get as much knowledge out of them as possible,” said Panos Markopoulos.
The assistant professor in RIT’s Kate Gleason College of Engineering, is developing more reliable data analytics by building new system algorithms that can automatically decrease the emphasis placed on corrupted or faulty data through a three-year, $499,236 grant from the National Science Foundation’s Office of Advanced Cyberinfrastructure.
With the increase in data collected from applications such as social networks, health care and computer vision, there is a need for more reliable data analysis. Intelligent systems rely on data collected across diverse sensing modalities (such as time, frequency and 3D space) and organize them in multi-dimensional arrays, also referred to as “tensors.” However, many existing methods of data analysis are sensitive to faulty measurements and may provide false conclusions, Markopoulos explained.
“These applications have their foundations in signal processing algorithms,” he said. “Our work is about taking large data sets, analyzing them and extracting knowledge. It is inevitable that some of this data will be faulty, noisy or corrupted and will not represent the system we are trying to understand. Our analysis should be robust against such faulty data. This is the real problem this project tries to solve—to do data analysis in a reliable way.”
The team’s approach will be three-fold: improve current algorithms to better assess data across the tensor arrays, develop software solutions to reflect the new processing capabilities and to allow the development of prototypes that use the new algorithms for social network analytics, machine learning—to train machines on how to understand differences between different classes of data—and computer vision.
“There is not much research in corruption resistant analysis of big data sets. This project will set the theoretical and algorithm foundations for this work,” said Markopoulos, an expert in signal processing and data analysis. He will be working with Andreas Savakis, RIT professor of computer engineering and researcher in computer vision, and Vagelis Papalexakis, a professor of computer science from the University of California at Riverside, who specializes in data mining.
“What I think that is most important about this project is it will give us the opportunity to develop new theory and algorithms. The findings of the project have the potential to have high impact, not only in the industry, but also in the theory of data analysis,” he said. “This can also have a significant impact in Academia, increasing the students’ skills in analyzing, understanding, and learning from Big Data.”