Madison, Wisconsin – How could a few pictures of a dog in the grass illustrate key concepts underlying computer vision, a sophisticated science aimed at teaching machines to perform visual tasks for humans - such as recognizing faces, objects and patterns?
Vikas Singh , assistant professor of biostatistics and medical informatics at the University of Wisconsin School of Medicine and Public Health, and graduate students Maxwell Collins and Jia Xu understand the relationship very well.
They created a fun, interactive application based on their highly academic computer vision research and put it on display at the recent Wisconsin Science Festival. Dozens of grade-school students were drawn to the demonstration, which gave them hands-on experience with a tablet that showed how a computer might see pictures as a human does.
In one example, the youngsters used their finger to swipe a simple line through a dog shown in a picture on the screen. The app used this prompt to isolate the entire dog and remove the background of grass surrounding it. The app then isolated the dog in a second, similar picture.
In stark contrast to the whimsical dog picture, Singh’s computer vision research has been used to advance brain imaging. Collaborations with researchers at the Alzheimer’s Disease Research Center, for example, have resulted in new ways to use multi-modal brain images - PET scans, MRIs and CT scans - to detect Alzheimer’s disease at the earliest stages.
Getting computers to see what humans see might sound like a simple task, but it is not.
“The human brain understands instinctively how to look at an image and see how it breaks up into different objects,” Singh says. “But for computers, that’s extremely difficult. Computer vision researchers have been studying this problem for the past 30 or 40 years.”
The problem is that computers can’t understand the content of an image. Unassisted, they can’t locate objects, distinguish foreground from background or identify boundaries in an image.
“We are still not able to give a computer a picture of a person and tell it to find that person among several images,” Singh says.
So computer scientists have developed a parsing method, called segmentation, that helps a computer identify the various segments that make up an image. It entails assigning a value to every pixel in an image so that the computer then recognizes that pixels with the same values share certain visual characteristics.
With a little input on what needs to be identified - such as the simple line drawn in the dog on the tablet - the computer knows some of the content of the segment that needs to be isolated, and begins the process.
Computers can be more effective at segmenting images if they can work off several loosely related images at once, Singh says.
“If you give me not one image but five, I’ll have a richer understanding of the underlying content of one image,” he explains.
Luckily, he says, images generally are collected in a series - the shots from the trip or that group of X-rays.
“Usually, we can find recurring content or shared similarities among them,” he says.
But working with more than one image means the computer must be able to segment them simultaneously. That’s done through a process called co-segmentation.
As various groups have been designing mathematical models to encapsulate the process, the UW researchers came up with a faster and more efficient one of their own. It was published this year in the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
The UW School of Medicine and Public Health scientists have been testing their co-segmentation model in clinical situations. They recently used it for identifying similar brain structures, including the corpus callosum, in images taken with a form of MRI called diffusion tensor imaging.
“Ultimately, we are trying to classify things into one or two categories in a few images,” Singh says. “At the core, this is a co-segmentation problem.”
But for best results, he says, solving the problem is easiest when you begin with trying your solutions on natural images - such as a dog in the grass - rather than more complicated medical images.