How the brain detects the rhythms of speech
Neuroscientists at UC San Francisco have discovered how the listening brain scans speech to break it down into syllables. The findings provide for the first time a neural basis for the fundamental atoms of language and insights into our perception of the rhythmic poetry of speech.
For decades, speech neuroscientists have looked for evidence that neurons in auditory brain areas use fluctuations in speech volume to identify the beginnings and ends of syllables—like a lin-guis-tics pro-fes-sor di-a-gram-ming a sen-tence. So far, these efforts have met with little luck.
In the new study, published November 20, 2019 in Science Advances, UCSF scientists discovered that the brain instead responds to a marker of vocal stress in the middle of each syllable—more like a poet scanning the sonnets of Shakespeare (Shàll Í còmpáre thèe tó à súmmèrs dáy?). The researchers showed that this signal—in an area of speech cortex called the middle superior temporal gyrus (mSTG)—is specifically based on the rising volume at the start of each vowel sound, which is a universal feature of human languages.
Notably, the authors say, this simple syllabic marker could also provide the brain with direct information about patterns of stress, timing, and rhythm that are so central to conveying meaning and emotional context in English and many other languages.
"What I find most exciting about this work is that it shows a simple neural coding principle for the sense of rhythm that is absolutely fundamental to how our brains process speech," said neuroscientist Yulia Oganian, Ph.D., who led the new research. "Could this explain why humans are so sensitive to the sequence of stressed and unstressed syllables that make up spoken poetry, or even oral storytelling?"
Oganian is a postdoctoral researcher in the lab of UCSF Health neurosurgeon Eddie Chang, MD, Ph.D., Bowes Biomedical Investigator at UCSF, member of the UCSF Weill Institute for Neurosciences, and a Howard Hughes Medical Institute (HHMI) Faculty Scholar, whose research laboratory studies the neural basis of human speech, movement, and emotion.
"What really excites me is that we now understand how a simple sound cue, the rapid increase in loudness that happens at the onset of vowels, serves as a critical landmark for speech because it tells a listener when a syllable occurs and whether it is stressed. This is a rather central discovery about how the brain extracts syllable units from speech," said Chang.
The study involved volunteers from the UCSF Epilepsy Center who temporarily had post-it-note-sized arrays of electrodes placed on the surface of their brains for one to two weeks as part of standard preparation for neurosurgery. These brain recordings allow neurosurgeons like Chang to map out how to remove the brain tissue that causes patients' seizures without damaging important nearby brain regions, but also allow scientists in Chang's neuroscience research lab to ask questions about human brain function that are impossible to address any other way.
Oganian recruited 11 volunteers whose seizure-mapping electrodes happened to overlap with areas of the brain involved in speech processing and who were happy to participate in a research study during their down-time in the hospital. She played each participant a selection of speech recordings from a variety of different speakers while recording patterns of brain activity in their auditory speech centers, then analyzed the data to identify neural patterns reflecting the syllabic structure of what they had heard.
The data quickly revealed that mSTG activity contained a discrete marker of individual syllables—contradicting the dominant model in the field that had proposed that the brain sets up a continuous metronome-like oscillator to extract syllable boundaries from fluctuations in speech volume. But exactly what aspects of speech were these discrete syllable markers in the neural data responding to?
To make it possible to identify what features of the audio recordings were driving the new-found syllable markers, Oganian asked four of her research volunteers to listen to recorded speech that was slowed down four-fold. These ultra-slow speech recordings let Oganian see that the syllable signals were occurring consistently at the moment of rising stress at the start of each vowel sound (e.g. as 'b' turns to 'a' in the syllable 'ba'), and not at the peak of each syllable as other scientists had theorized.
The syllabic marker Oganian discovered in the mSTG also varied with the emphasis the speaker placed on a particular syllable. This suggested that this first stage of speech processing simultaneously allows the brain to split speech into syllabic units and also to track the patterns of stress that are critical for meaning in English and many other languages (e.g. "computer console" vs. "console a friend"; "Did I do that?" vs. "Did I do that?").
The syllabic signal also provides a simple metronome for the brain to track the rhythm and speed of speech. "Some people speak fast; others speak slow. People change how quickly they speak when they are excited or sad. The brain needs to be able to adjust to that," Oganian said. "By marking whenever a new syllable is occurring, this signal acts as an internal pacemaker within the speech signal itself."
The researchers are continuing to study how brain signals in the mSTG are interpreted to enable the brain to process speech rhythmicity and meaning. They also hope to explore how the brain's interpretation of these signals varies in languages other than English that put more or less emphasis on the stress patterns of speech.