Listening to a single voice in a crowded cocktail party sometimes seems like picking a needle out of a haystack, but new research shows that people may be better at this than expected.
The results surprised the University of Washington, Seattle, research team, which tested how well people could pick out one sound from a dense collection of noises.
The researchers asked ten subjects to listen to multiple streams of letters. A stream consisted of a repeating letter, for example, Q-Q-Q-Q. If four streams were played, the listener heard four different repeating letters, say, D, C, Q and J. The letters came fast --the time interval between each letter was just one-twelfth of a second.
In front of the listener was a computer screen. Before the start of each trial, the researchers put one of the four letters on the screen to prime the subject to focus on it. If he heard an oddball letter in that stream, such as R instead of Q, he was to press a button.
To make it easier on the listener, each letter stream carried a different pitch and came from a different location in the room. R was chosen as the oddball because it doesn't rhyme with any other letter.
"Unlike most experiments where you try to make it difficult for the listener to do the task, we tried to give every advantage we could," said Adrian K.C. Lee, a speech and hearing researcher at the university, who worked closely with Ross Maddox.
As expected, when the number of streams went up, the ability to discern the letter came down. But even with 12 streams the letter was identified correctly around 70 percent of the time.
"We expected that 12 streams would have broken the upper limits of the [subject's hearing] system," said Lee. "It is surprising that even with twelve things coming at you at the same time you can lock on to one with reasonably high accuracy."
The work was presented last month at the Acoustics 2012 Hong Kong conference.
Down the line, the researchers want to use these experiments to design a way for paralyzed patients to control a wheelchair or a computer using brain signals. Such devices, called brain-computer interfaces, have mostly relied on visual or motor stimuli. Typically, a subject might focus on a visual cue or imagine making a movement. Using a machine that detects brain signals, such as an electroencephalogram, researchers would attempt to characterize the brain responses connected with that task and translate them into commands. Focusing on an auditory signal too produces brain signals that can be characterized. However, the current study did not look at brain signals.
A very practical reason to look at auditory interfaces is that eye-gaze control -- on which visually-controlled interfaces are based -- is often absent in people in a late stage of a neurodegenerative disease, said Martijn Schreuder, a researcher at the Berlin Institute of Technology.
Schreuder, who has worked on an interface where subjects spelled words by focusing on particular sounds, pointed out that auditory interfaces allow someone who is completely blind to communicate.
Schreuder said Lee's work provides hints on "whether or not it's good or bad to have different [audio] streams or whether it is good to have a quicker repetition or not." To his knowledge, this is the first time researchers have gone up to 12 streams. Previous research included only two streams.
The other part Schreuder found interesting was how quickly the listeners learned how to discriminate between letter streams.
"There is a difference between being able to spell one letter every two minutes or spelling three letters per minute, which is the range [brain-computer interfaces] go," Schreuder said. "So if one selection takes 20 seconds, it's worse than if it goes 10 seconds."
The University of Washington researchers are planning follow-up experiments to directly investigate how the brain responds to audio streams.