How our brains can recognize previously unseen scenes, objects or faces in a fraction of a second
At the end of a long day, as we put our feet up, reach for the remote control and begin watching TV, we may find ourselves confronted with images beyond our experience—such as "The Upside-Down," the mysterious parallel dimension inhabited by a tulip-headed monster portrayed in the Netflix show Stranger Things.
This shadowy world holds up a bizarre mirror to our own, showing us a place of endless darkness and decay, where familiar infrastructure is so overgrown with twisted rope-like tendrils and webs of biological matter as to render it almost unrecognizable. And yet, even though those strange images lie in the realm of the unknown, do we struggle to recognize them? No, we do not.
In about a tenth of a second—too quickly for us to even be aware it's happening—our brains figure out what we are seeing and make sense of it.
The extraordinary speed and mastery of interpretation that our brains exercise in such situations is the focus of pioneering research by USC Dornsife vision scientists Irving Biederman and the late Bosco Tjan.
"It's the miracle of pattern recognition," said Biederman, Harold Dornsife Chair in Neurosciences and professor of psychology and computer science. "People can be misled into thinking it's a very easy, simple process because it occurs so quickly and automatically, but the fact is half of our brain is dedicated almost exclusively to vision."
Indeed, Biederman and Tjan's research is focused not on the eye itself—what most people think of when they hear the word 'vision'—but on how the brain achieves vision.
Biederman compares the way the eye works to a camera recording images.
"Like a camera, the eye doesn't know what it's looking at," he said. "It's our brain that interprets the image, not the eye."
Biederman directs the Image Understanding Laboratory, which is researching how a scene, object or face can be recognized in a fraction of a second, even when we have never encountered that image previously.
His own research explores shape recognition, which provides the major entrée to visual cognition—the process of interpreting and understanding what we see.
"Of course, we also get color, texture and movement, but most of what we understand and remember about what we see comes from shape," he said. "A line drawing of a scene tells us pretty much what we want to know. The question is, 'How is that done?' How is it possible to achieve visual understanding of a scene we've never experienced before?"
First, we need to overcome a deceptively complex problem: Our retina is two-dimensional while the world is three- dimensional.
Biederman invites us to think of a chair and imagine looking at it, or indeed trying to draw it, from the most unusual perspectives.
"If we rotate that chair it can present an infinite number of images, many of which—upside down and viewed from below, for instance—we've never experienced before. Yet, with the exception of a few unusual projections of that image, we'll almost always be able to appreciate its three-dimensional shape."
This ability becomes the miracle of pattern recognition: how we're able to understand scenes never seen before, from viewpoints never viewed before.
"These scenes and objects are projecting images that are completely novel and yet we can instantly make sense of them," Biederman said. "It would seem to be an impossible feat and yet we do it all the time. A child does it and we do it so easily that we're hardly aware that it reflects an extraordinary achievement."
So how do our brains pull it off?
The answer, Biederman says, lies in the brain's ability to decompose complex objects into simple shapes like cylinders, bricks, wedges and cones, which he calls "geons."
"It turns out that you can model most objects in terms of a very small vocabulary of these simple shapes, numbering about 30 or 40," he said.
"If we represent an object we're looking at in terms of geons, then we're able to recognize what the object is from almost any viewpoint." That's because the components—the geons—that make up the object are easily distinguishable from one another regardless of viewpoint.
The characteristics of an object that enable us to do this—what Biederman terms "nonaccidental properties"—are small in number. They include points where contours (the lines that mark the edges of an object and form its outline) meet and end, like the corner of a table; whether a contour is straight or curved, such as a door or a ball; and whether a pair of contours are parallel or converging, such as those on an ice cream sandwich or an ice cream cone.
A few exceptions do exist. For instance, a brick and a cylinder both look the same if viewed directly from the side. "But even then," Biederman notes, "a slight change in orientation of the brick or the cylinder will tell you, 'That's the cylinder and that's the brick.'"
Ultimately, he says, geons and nonaccidental properties are what enable us to look at a previously unseen abstract sculpture and understand its shape. Our brain is able to break down the various parts that make up the whole into comprehensible geons and then come up with an interpretation in terms of nonaccidental properties and vertices. When we cannot represent the object in terms of its simple parts, such as with a nebulous mass, then we will have trouble distinguishing it from another at different viewpoints.
Mapping the Brain
The region of the cortex that is responsible for this amazing feat of perception is the lateral occipital complex (LOC), an area of the brain at the border between the occipital and temporal lobes, just above and behind the ears. Given an image, the LOC will not only determine the geons that make it up, but also the relationships between them.
Functional magnetic resonance imaging (fMRI), which measures changes in blood flow within the brain, made identifying the LOC relatively easy, Biederman said. It clearly indicated greater activity in that region of the brain when subjects were shown intact images of objects than when shown scrambled versions of those objects. That knowledge enabled the scientists to concentrate their studies on that area.
Research by Biederman and Tjan, who at the time was professor of psychology and co-director of the Dana and David Dornsife Cognitive Neuroimaging Center, showed that the activation of the LOC does not depend on whether an object is familiar. They tested this by rearranging the geons of familiar objects so that they appeared as novel items, similar to rearranging letters of a word to make a non-word.
"We found that the LOC is activated equally by abstract sculptures and familiar objects," Biederman said.
In addition to identifying objects, our brain also needs to make sense of all that we see. Often a single glance is all it takes; however, if faced with a random array of objects, we may have to look at each individually to gain an appreciation of the whole scene. For example, a quick glance at a kitchen is usually enough to immediately understand what we're looking at, but comprehending a messy collection of items piled up in a teenager's closet may require us to look at each object separately.
A recent experiment carried out by Tjan, Biederman and Eshed Margalit, who graduated from USC Dornsife in 2016 with a bachelor's degree in computational neuroscience and is now pursuing graduate studies in neuroscience at Stanford University, addressed this. The study showed that separating the geons of an object so they are no longer interacting—in other words, no longer making up the object but simply separated from each other—causes even less activity to occur in the LOC than for an intact object.
If we go one step further and scramble the geons into a mass of random pixels, the LOC shows still less activity. In other words, the LOC is working to interpret both the shape of the parts and the relations between these parts.
Similarly, this sensitivity of the LOC to the relations between parts composing an object is also witnessed with the relations between objects composing a scene. Thus, the LOC shows stronger activation with an image of a hand holding a cup than an image of a hand beside a cup.
"This applies generally, not just to hands and cups but to any pair of objects," Biederman said. "One might have thought the opposite, that two things—a hand and a cup—would cause more activity in the brain than essentially one thing, a hand holding a cup. But we found that more activity occurs in the LOC if objects are shown as interacting, rather than side-by-side.
"The LOC is an extraordinary mechanism for giving us not only the shapes of parts, but also how they relate to each other, and it does the same for scenes, giving us the shapes of the objects making up the scenes as well as the relations between them," he added. "It is the area where objects become scenes."
A Pathway to Pleasure
Biederman's study of higher-level vision led him to explore the neural basis of the pleasure we derive from seeing and understanding, especially something new.
Visual signals travel a pathway from the retina at the back of the eye, through the optic nerve and along neural fibers and cables to the occipital cortex in the back of the brain. Activation of the LOC follows, and then regions at the back of the temporal lobe spark. This last area is where we achieve a rich interpretation of the visual input, be it a scene, object or face.
Interestingly, opioid receptors, which convey nerve signals linked to pleasure, are dispersed in a gradient along the entire visual pathway, with few receptors in the early stages building to more and more in the later stages.
"We found that being able to recognize a scene that we specifically have never seen before gives us more opioid release—and thus more pleasure—than something we can't recognize or that we've seen many times before," Biederman said.
This opioid fix explains the joy and appeal of new experiences. But why is novelty important to us? Biederman explains.
"When you have a new experience, initially many neurons are activated. But once the experience is over, the neurons that were most strongly activated inhibit the neurons that were only weakly or moderately activated by that experience. The next time you have the same experience, you get less opioid release. This explains why we seek out new experiences.
"Don't feel sorry for the inhibited neurons, though. They are now freed up to code different experiences. It's a reflection of the brain's extraordinary capacity to divvy up its own neural connections, leaving only a minimal number of neurons to code prior experiences and having lots of neurons in reserve to code new experiences."
Humor and Creativity
This desire for novelty is further borne out by Biederman's research into the links between vision and creativity. Using The New Yorker's popular weekly cartoon caption contest, he is exploring what happens in the brain when it attempts to solve humor challenges. He opted to study humor, he said, "because it provides a practical and universal way to explore creativity that can occur in time frames sufficiently short to be amenable to fMRI analyses.
"In contrast, visual art may be able to give us the new experience we crave, but it can be debatable whether a certain work of abstract art is creative," he said. On the other hand, there is no debate when humor is successful, as the end result—laughter—is pretty much universal.
A cartoon contains an incongruous element, something that doesn't quite fit.
"The caption to the cartoon, to be funny, cannot be obvious but has to link remote concepts that resolve the incongruity in the drawing," he said. "Because the concepts are remote, their linking will necessarily result in the activation of a great number of intervening neurons with a concomitant and sudden deluge of opioid activity, causing us to laugh. But once we've seen the cartoon and we've got the joke, the inhibition of the weakly activated cells by the strongly activated cells reduces the amount of opioid release and thus the pleasure is diminished."
Biederman says this desire for new but interpretable information is a system that makes us "infovores"—eager consumers of information.
In earlier research, Biederman and Ori Amir '15, a former USC Dornsife Ph.D. student now at the University of California, Santa Barbara, studied preferences for viewing simple geons. When presented with a pair of dissimilar geons, say a cylinder on the left and a cone on the right, both 4-month-old infants and college students preferred looking at the geons with non-parallel sides or with curves. This correlated with similar studies in the lab that showed how curvy or nonparallel shapes produced higher activity in visual pathway neurons than straight or parallel shapes.
"That greater activity means we get more opioid release and thus more pleasure from looking at those shapes," Biederman said. "Our eye movements are not random but, when we are not engaged in a deliberate search, such as looking for our car in a parking lot, they are directed towards entities that will give us more opioid activity—a system that is established as early as four months."
Focus on Visual Crowding
Tjan, who died on Dec. 2, 2016, was an international expert on visual crowding. Postdoctoral and doctoral students in Tjan's laboratory are continuing his legacy of pioneering research, aimed in part at bringing hope to macular degeneration patients with impaired vision.
About 20 percent of us will find our vision degraded as the macula, a region near the center of the retina, degenerates in our later years. As patients lose their high-resolution central vision, many develop a preferred retinal locus (PRL). This means they have learned to compensate for their impaired central vision by looking slightly away from objects on which they wish to focus, thus using the part of the retina with the highest remaining resolution.
While PRL is helpful, it comes with a major disadvantage: visual crowding. This occurs because cells in the periphery of the retina have larger receptive fields than the tightly packed center. Patients with macular degeneration who use PRL to focus on, say, a given letter on a page, often experience visual crowding when other nearby letters activate the same receptive field being employed to perceive a given letter. This results in mixed-up shapes, making it difficult if not impossible to interpret the shapes of letters, objects and scenes.
Tjan successfully demonstrated how a training regimen could reduce visual crowding's deleterious effects on vision.
Tjan pioneered the study of PRL in normal subjects without macular degeneration so he could understand how the condition progresses. By deliberately occluding their central vision, he was able to train his test subjects to use a region of reasonable clarity or resolution away from the center of the retina. Although not as good as the original central vision, this area provides better focus than more peripheral regions.
Further, Tjan and his team used fMRI to show that training actually changes the way the brain works, improving visual processing in the primary visual cortex, the starting point for visual processing in the brain.
"There are just a few really great mysteries in the world," Biederman said. "There is cosmology and dark matter, and then there is higher-level vision and the brain. And we have come a long way in explaining how we make sense of what we see, this extraordinary achievement of the brain that had never been understood before."