Less reward, more aversion when learning tricky tasks
We celebrate our triumphs over adversity, but let's face it: We'd rather not experience difficulty at all. A new study ties that behavioral inclination to learning: When researchers added a bit of conflict to make a learning task more difficult, that additional conflict biased learning by reducing the influence of reward and increasing the influence of aversion to punishment.
This newly found relationship between conflict and reinforcement learning suggests that the circuits in the frontal cortex that calculate the degree of conflict, effort, and difficulty of actions are integrated with the dopamine-driven circuits that govern perceptions of reward and punishment in another part of the brain, the striatum. In two sets of experiments reported in Nature Communications, scientists at Brown University and the University of New Mexico gathered evidence for the link by several means, including EEG scans, genetic tests, manipulation with a low dose of a dopamine-related drug, even tracking eye blinks.
"The signals in the cortex that respond to conflict act to induce an aversive learning signal in your basic reinforcement learning systems," said Brown University cognitive scientist Michael Frank, co-author of the study led by former student James Cavanagh, now an assistant professor at the University of New Mexico.
A Tricky Task
The conflict in the experimental learning was merely a matter of having to use the left hand to indicate the selection of a stimulus on the right side of a screen, or vice versa. This simple case of spatial conflict is well established in cognitive psychology. In this study it slowed responses by only about 12 milliseconds but elicited reliable EEG brain signals typically associated with a conflict-induced "alarm bell."
Here's how the task worked: In a learning phase the 83 volunteer adults simply had to press the left button on a game pad when they saw a blue shape or the right button when they saw a yellow one. There were four shapes in all (call them A, B, C, and D) that could appear on either side of the screen. Each shape had a different probability of providing a one point reward when learners pressed the correct button. A was always rewarded, D was seldom rewarded. B and C were each equivalently rewarded 50 percent of the time, but B never provided a point when it appeared on the side opposite from the button and C's reward occurred only when it appeared on the side opposite from the button.
In this way, punishment (no points) for B became associated with the opposite-side conflict as did C's reward (one point).
After the conflict-infused learning phase, people then moved on to a second phase where they were shown pairs of these previously observed shapes and had to indicate their preferences in terms of which one they thought was more rewarding.
Everyone learned that A was rewarding and D was not, but learned perceptions of B and C were skewed in one of two ways for each participant. For those who learn better from reward, conflict acted to reduce experienced reward value, leading to a preference for B over C. For those who learn better from avoiding punishment, conflict acted to enhanced experienced punishment value, leading to greater avoidance of B. In essence the latter effect is like "adding insult to injury," where conflict made gaining no points even more aversive.
Behavior in the brain
The researchers weren't just relying on behavioral observation to inform their study. The EEG sensors monitored the midcingulate cortex, which previous research identified as the site where the brain determines the costs of effort, difficulty, and conflict in action. The sensors measured the strength of theta and delta frequency brainwaves while people carried out the phases of the task.
"The degree to which conflict reduced reward-related theta/delta activity of C compared with B was related to preferences for B, and the degree to which conflict enhanced punishment related-theta activity of B compared with C was related to avoidance of B," the authors wrote. "These findings suggest that conflict acted to both diminish reward value and to boost punishment avoidance within cortical systems associated with interpreting the salience of feedback."
So how does the cortical conflict signal actually change learning about reward values? The researchers looked to the volunteers' genes, specifically one called DARPP-32, which governs how dopamine is processed in downstream areas of the brain. That's because research has shown that people with some variants of the gene are more sensitive to reward learning, while people with other variants are more sensitive to punishment avoidance learning, consistent with how this gene affects dopamine function in neurons sensitive to rewards and punishments in the striatum.
The genotyping confirmed that whether people became biased in favor of B or C had to do with their genetic predisposition to learn more from reward or avoiding punishment.
In a second set of experiments with 30 volunteers, Cavanagh, Frank, and their co-authors actively manipulated dopamine function in this downstream area (i.e., the striatum). They gave subjects safe, low doses of the drug cabergoline, which temporarily reduces receptivity to dopamine. Prior work had shown that this subtle effect causes people to learn more from punishment avoidance than reward. Sure enough it did. Without the drug (on placebo), volunteers overall slightly favored B over C, but with the drug, that flipped to a significantly greater bias for C over B, consistent with learning from punishment avoidance.
They even observed that that degree to which this drug affected the conflict value learning was related to its effects on eye blink rate, which has been linked to dopamine activity.
Cavanagh said he hopes to apply the knowledge to better understand learning in people with obsessive-compulsive disorder and other anxiety disorders who have enhanced theta band signals of conflict.
"Does it make them learn more from 'punishments,' does it make them learn less from reward?" he said. "What does the consequence of this well-known alteration in anxiety have to do with the way they learn from the world?"