Overturning widely held ideas: Visual attention drawn to meaning, not what stands out
Our visual attention is drawn to parts of a scene that have meaning, rather than to those that are salient or "stick out," according to new research from the Center for Mind and Brain at the University of California, Davis. The findings, published Sept. 25 in the journal Nature Human Behavior, overturn the widely-held model of visual attention.
"A lot of people will have to rethink things," said Professor John Henderson, who led the research. "The saliency hypothesis really is the dominant view."
Our eyes we perceive a wide field of view in front of us, but we only focus our attention on a small part of this field. How do we decide where to direct our attention, without thinking about it?
The dominant theory in attention studies is "visual salience," Henderson said. Salience means things that "stick out" from the background, like colorful berries on a background of leaves or a brightly lit object in a room.
Saliency is relatively easy to measure. You can map the amount of saliency in different areas of a picture by measuring relative contrast or brightness, for example.
Henderson called this the "magpie theory" our attention is drawn to bright and shiny objects.
"It becomes obvious, though, that it can't be right," he said, otherwise we would constantly be distracted.
Making a Map of Meaning
Henderson and postdoctoral researcher Taylor Hayes set out to test whether attention is guided instead by how "meaningful" we find an area within our view. They first had to construct "meaning maps" of test scenes, where different parts of the scene had different levels of meaning to an observer.
To make their meaning map, Henderson and Hayes took images of scenes, broke them up into overlapping circular tiles, and submitted the individual tiles to the online crowdsourcing service Mechanical Turk, asking users to rate the tiles for meaning.
By tallying the votes of Mechanical Turk users they were able to assign levels of meaning to different areas of an image and create a meaning map comparable to a saliency map of the same scene.
Next, they tracked the eye movements of volunteers as they looked at the scene. Those eyetracks gave them a map of what parts of the image attracted the most attention. This "attention map" was closer to the meaning map than the salience map, Henderson said.
In Search of Meaning
Henderson and Hayes don't yet have firm data on what makes part of a scene meaningful, although they have some ideas. For example, a cluttered table or shelf attracted more attention than a highly salient splash of sunlight on a wall. With further work, they hope to develop a "taxonomy of meaning," Henderson said.
Although the research is aimed at a fundamental understanding of how visual attention works, there could be some near-term applications, Henderson said, for example in developing automated visual systems that allow computers to scan security footage or to automatically identify or caption images online.