Research Highlights / Full Story

Capturing Data Through Image Segmentation

RIT researchers are developing some of the most sophisticated computing technologies to process static and advanced-video images through image segmentation—a process defined as the partitioning of an image or video stream into pixel sets of specific objects and object components. Distinguishing those objects and components has inherent challenges: recognizing textures, color gradation, and object groupings, determining correct placement of objects, as well as being able to confidently rely on computer applications to systematically recognize, isolate, and process these and other distinguishing characteristics of objects within images. Using image segmentation provides a means for more accurate analysis of images—whether monitoring natural resources adjacent to urban areas for flood prevention, observing buildings or facilities and detecting variations or damage to infrastructure after a natural disaster, or finding anomalies in areas for target detection and surveillance.

Eli Saber, professor of electrical engineering in RIT's Kate Gleason College of Engineering, developed a complex algorithm that prompts the segmentation technology. This platform technology is being used across various industries from biomedical applications to security and surveillance, from entertainment and resource recovery to advanced printing. The algorithm and subsequent processing technology has been successfully used to improve the analysis of multiple static images, compressing data within the images to build three-dimensional models. Saber and his research team have begun expanding the concept to video imaging, capturing multiple moving images, and extracting data.

Saber has collaborated with David Messinger, director of the Digital Imaging and Remote Sensing Laboratory in the Chester F. Carlson Center for Imaging Science, and the two have developed applications to address image segmentation demands utilizing this multidimensional, computing algorithm—called MAPGSEG (multi-resolution adaptive and progressive gradient-based color image segmentation)—developed at RIT in conjunction with Hewlett-Packard Company.

"Partitioning generates a reduced and relevant data set for high-level operations such as rendering, indexing, classification, compression, content-based retrieval, and multimedia applications," says Saber, who leads the Image, Video and Computer Vision Laboratory in the engineering college.

This type of partitioning, or segmentation, comes naturally to humans. The human eye views and distinguishes numerous images daily, and the brain processes the information in real time. Developing a simulated environment to perform similar tasks is the basis of the MAPGSEG algorithm. It can selectively access and manipulate individual content in images based on desired level of detail. MAPGSEG is a solution that computationally meets the demands of many practical applications involving segmentation and can be a reasonable compromise between quality and speed that lays the foundation to do fast and intelligent object/region-based, real-world applications of color imagery, Saber adds.

Today, some of those segmentation applications of color imagery are being adapted for biomedical imaging, object recognition, and surveillance. Saber and Messinger are bringing to fruition improvements to, and unique solutions for, these applications and others.

Successful Integration of Biomedical Image Segmentation

MAPGSEG works within CT-scan software to expose tumors, revealing a more accurate reading of actual size, weight, location, and density. The application is integrated into RECIST measurements—Response Evaluation Criteria in Solid Tumors—measurements commonly used by physicians to mark the baseline tumor size, then further, as a comparative measure of the tumor's response to treatment.

Radiologists often look at 100 or more progressive slides of a mass taken by a CT scan. The physician manually compiles the origination and extension points, or vectors, of the tumor found on the slides.

"Our segmentation technology runs in the background; it segments the tumor across the slices, and provides a volume measurement," Saber explains. "That RECIST measurement triggers our segmentation. Think of it as seeding. The seed will grow and spread through the entire image and begins to grow to find the tumor using our algorithm segmentation techniques. It pulls the 'images' out of all the appropriate screens, and compiles them to the full image of the tumor."

The work done by Saber and his research team has not gone unnoticed. He and Sohail Dianat, head of RIT's electrical and microelectronic engineering department, worked with DataPhysics on a project to help improve its next-generation CT-scan technology. The California company announced in 2011 that it would open a new product development facility in Rochester. One of the reasons cited for its move was its proximity to RIT and Saber's research activities in image segmentation.

DataPhysics is only one of several corporate connections Saber has fostered since coming to RIT in 2004. Since that time, he has secured more than $3 million in external funding with companies such as HP, Lenel, Varian Semiconductors and Ortho-Clinical Diagnostics—a Johnson & Johnson Company—a broad representation of industries that are incorporating image segmentation into their respective business applications to better understand image components, image compressions, modeling, and imaging search functions.

This is especially true in the analysis of remotely sensed images from satellite and sensor technologies. Developments in these areas have increased the quantity of high-resolution images faster than researchers can process and analyze data manually.

Saber and Messinger are developing advanced intelligence processing technologies to handle those large volumes of data in a timely manner, and to effectively distinguish objects, scale, complexity, and organization using MAPGSEG, the foundational technology of static, two-dimensional image segmentation.

Analyzing Satellite and Advanced Video Images Effectively

With the advent of more capable sensors and unmanned aerial vehicles, images can be acquired at higher rates. Saber and Messinger are exploring the use of topological features to improve classification and detection results and focusing on development of a segmentation methodology to differentiate the unique cues of moving and still objects derived from full- motion video capture.

They received more than $1 million in federal funding this past spring for two separate but related research projects: "Spatiotemporal Segmentation of Full Motion Airborne Video Imagery" and "Hierarchical Representation of Satellite Images with Probabilistic Modeling." The aim is to provide an effective foundation that will assist analysts in tasks such as target detection and recognition, classification, change detection, and multisensor information fusion. Breaking down high-resolution images into groupings based on size and spectral similarity serves as an essential step in reducing the complexity, by first deconstructing the image by scale, and then organizing it in a hierarchical fashion, Saber explains.

The process has multiple benefits including being able to combine data collected from multiple sensors, a representation that allows for the organization and sorting of a large area, and the ability to query topological information to help analysts examine objects and corresponding elements, giving contextual information about a scene. It also utilizes probabilistic modeling to quantify, with a high degree of certainty, the quality and reliability of the information.

Partitioning the digital frames into non-overlapping, specific regions, or objects—from a three-dimensional viewpoint, and events or shots from a sequential perspective—facilitates selective access and manipulation of individual content. This is essential in establishing the foundation for high levels of analysis, enhancement, classification, storage, and compression of full-motion video to extract intelligence information, says Sankaranarayanan Piramanayagam, an imaging science doctoral student from Chennai, India, working on the project.

"We treat video as a three-dimensional volume, and partition the data into meaningful spatiotemporal regions or shots. We are trying to extend and optimize the current segmentation framework for full-motion video imagery. We are trying to think like an analyst—figure out a specific or anomalous event, beyond motion detection, from a full-length video sequence," says Piramanayagam.

The overall concept is like taping an entire NFL game, but being able to program a system to isolate and compile only touchdown plays, for example. This simple analogy does not, however, convey the system complexity needed to produce a sophisticated "highlight reel" of data in image form.

The researchers' approach begins with the estimation of motion from the input video, providing appropriate cues for moving versus still objects, and foreground versus background information. Following this, a three-dimensional, spectral, or motion-based, edge detection method is performed to extract significant gradient information of spectral changes across the volume. More specifically, locations with small gradient magnitudes are congregated together and uniquely labeled to identify a set of seeds that initialize sub-volume formation. The outcomes expected include selective access and manipulation of full-motion video content, rapid analysis and interpretation of full-motion videos, an object-orientated, scale-space analysis, and widespread military and geospatial intelligence applications.

"It all comes down to efficiently handling large amounts of image data collected from satellites and video streams, which are not necessarily big images, but I can collect video for hours," says Messinger, who is also an associate research professor in the Carlson Center and oversees multidisciplinary research using remote-sensing techniques for archaeology and disaster management. He served as an aerospace engineer at Northrop Grumman before coming to RIT in 2002. "You'd like to be able to download the data, have it go into a computer system and have it reducethat eight hours of video down to 20 minutes that somebody has to look at, just the highlights, so they can process the information to make decisions."

Use of video images becomes an additional and important variable in the segmentation processing equation. Computers interpret object information from images and video as a two-dimensional plane, unlike humans, who understand an object's three-dimensional aspects, says Saber.

"Once objects have been identified and indexed, an analyst can target objects rather than individual pixels," he adds. "We struggle in doing the proper video segmentation intelligently. How do computers form this recognition that we as humans have understood for most of our lives? How do you get the computer to recognize images the same as humans would do it? It is a problem that is largely unsolved and difficult."

But, it is also a problem that the team is solving, producing a knowledgebase that would be adaptable for identifying structures, objects of various sizes, shapes, and timescales, Messinger adds.

"It has to be flexible enough to capture all of that information in multiple spatial and temporal scales," he says. "I want to be able to process it to extract information automatically, so I can make the process more efficient for the end user."