Research Highlights / Full Story

Creating Human-Centered Image Retrieval

Several years ago Anne Haake, professor in RIT's Golisano College of Computing and Information Sciences, undertook a sabbatical at the National Library of Medicine that included user research in Content-Based

Image Retrieval (CBIR). The computer-based technique, which catalogs and retrieves images from a database of defined characteristics, is considered a potential technical improvement over current image databases used in medical diagnosis and prognosis.

"The images taken from an individual patient could be compared to previous images taken by that medical center or other centers around the world," says Haake, who was trained in biology and software development and now studies human-computer interaction and biomedical informatics. "Thousands of images could be handled and analyzed quickly and previous data about how tumors progress or how a particular disease may look at different stages could be easily transmitted to doctors."

However, in reviewing different CBIR systems as part of her NLM research, Haake noted that the technology was hindered because it did not effectively take into account human analysis of the images themselves or the expertise of the analyst during the early design stages.

"CBIR assumes a certain level of uniformity in the viewer, that every person looking at the image sees it the same way, uses the same terminology to describe what they see, and has the same level of expertise," she adds. "However, this does not take into account expert knowledge and past experience, which are crucial in image semantics, making content retrieval and comparison problematic at best and impossible in some cases."

Haake has sought to address this issue by developing specific models of human expertise and tacit knowledge that can increase the overall "intelligence" of CBIR systems.

Through a multidisciplinary research team with imaging scientist Jeff Pelz, computer scientist Pengcheng Shi, dermatologist Cara Calvelli, and computational linguist Cecilia Ovesdotter Alm, Haake is seeking to create a next-generation, human-centered CBIR system that more directly incorporates expert cognitive processes. In addition, the team hopes to better model human-computer interaction to enhance overall system usability.

Enhancing the Human Component

"For many years computer designers argued that the algorithm itself was the end point," says Shi, director of RIT's Ph.D. program in computing and information sciences. "But the community has started to realize that equations themselves are not going to do the job. At the end of the day we need to more carefully consider the human in technology systems such as CBIR to make the data we are producing more useful."

To develop human-centered CBIR, the team has sought to objectify domain knowledge, the means by which experts in the field view, discuss, and analyze information, and incorporate it into the design of the databases and search functions that power CBIR systems.

"Through the use of machine learning, visual perception, and linguistics, we can better account for how people perceive and categorize images and incorporate this data into the computer algorithms used in CBIR systems," notes Shi. "Human-centered CBIR will ultimately produce more robust data that more accurately simulates human analysis."

With support from the National Institutes of Health and the National Science Foundation the team is currently working to advance image understanding and develop a prototype CBIR system for analyzing dermatology images.

Led by Calvelli, a trained dermatologist who currently serves as an associate professor in the physician assistant program in RIT's College of Health Sciences and Technology, the team is working to analyze and collect expertise from dermatology experts, technicians, and students.

Utilizing probabilistic models and learning algorithms, the data collected will be used to develop multi-modal, interactive content-based image retrieval systems that take into account perceptual learning and cognitive processing in how images are categorized and retrieved. The resulting CBIR system will be more usable, with a more robust search function and enhanced analysis capabilities. More importantly, it will better mirror the knowledge and expertise of the users utilizing the system.

"Through the collection of eye-tracking and linguistic data we can build models that account for the semantic and visual inputs that affect how an expert perceives an image, simulating the uncertainty inherent in human decision-making," Haake says. "In other words, we can make CBIR decisions more 'human.'"

Modeling Sight and Language

The team, with the assistance of undergraduate and graduate students in imaging science and computing and information sciences, utilized a remote, video-based eye-tracking system to discover what dermatologists found perceptually important about images. Sixteen dermatologists viewed skin conditions in 50 different images displayed on a monitor as they explained their diagnoses to physician assistant students. A device recorded the participants' eye movements as they lingered on the critical regions in each image. These data points, referred to as visual Areas of Interest (AOIs), have been shown to provide more objective evaluations of images and reveal the users' cognitive processing.

The eye-tracking data was also used to compare how different physicians analyzed the images, based on expertise, color, contrast, size, shape, and visual versus verbalized conceptual characteristics.

Preliminary results indicate that participants made their decisions based on perceptual information from multiple relevant regions in the image and people with comparable expertise had similar eye movement patterns. Pelz, co-director of the Multidisciplinary Vision Research Laboratory, says this will enhance the development of workable probabilistic models because it shows there is a correlation between domain knowledge and image perception.

"By categorizing the AOIs and perception differences between participants, we can create a set of common markers that can better inform how images are categorized and retrieved," he adds. "This will ultimately allow the CBIR system to 'perceive' the images in the same way they are perceived by the dermatologists using the system."

In addition to the eye-tracking data collection, team member Cecilia Ovesdotter Alm, a linguist and visiting assistant professor of English at RIT, is utilizing the interactions between the experts and their students during the sessions to develop a computational linguistics model related to the image analysis. Alm is mining the physicians' spoken explanations to characterize meaningful language behaviors commonly used in the dermatology community. This linguistic model will be incorporated into the CBIR system.

Alm says, "By fusing data from different sources we can create a more robust system built on the end-user's knowledge."

The team hopes the data collected will enhance general research into visual perception and linguistics, providing additional insights on how the brain processes images and how they are described verbally.

"Most of what we know about visual perception is based on carefully designed experiments performed in the laboratory where conditions can be carefully controlled," Pelz adds. "By analyzing real-time image analysis between doctors and students in the master-apprentice model, we can more accurately assess how perception works in 'real world' environments."

The Potential of CBIR and Machine Learning

Design of the prototype system will be completed over the next year and Calvelli and her students will assist in evaluating its effectiveness and identifying areas of improvement. The group will also seek to identify potential utilization opportunities for the finished product, including through RIT's partnership with Rochester General Health System (see side bar).

"It is our hope that this system can ultimately be implemented by laboratories and medical centers to enhance the overall effectiveness of dermatological imaging, and ultimately increase the quality of teaching and medical diagnosis," adds Calvelli.

In addition, Haake sees the current research as a model for similar applications of human-centered CBIR as well as broader efforts to enhance machine learning and human-computer interaction.

"When you are managing thousands of images with complex data points, technology is a necessity to make sense of it all," she adds. "But you cannot become so reliant on the technology that you lose the necessary human touches needed to provide appropriate meaning."

"Human-centered CBIR is one mechanism for restoring some level of human decision-making to the system, and we hope it can serve as a model for improving human-computer interaction in multiple fields and disciplines."