Researcher gives subjects their voice

by Angela Herring
Credit: Rupal Patel.

Stephen Hawking and a 9-​​year-​​old girl with a speech dis­order most likely use the same syn­thetic voice. It's called Per­fect Paul and it's easy to under­stand, espe­cially in acousti­cally chaotic envi­ron­ments like class­rooms full of chil­dren. While new, more natural-​​sounding voices are avail­able, Per­fect Paul remains the most oft-​​used syn­thetic voice in the com­mu­nity of dis­or­dered speakers.

But Per­fect Paul con­veys none of the per­son­ality inherent in vocal iden­tity, explains Rupal Patel, an asso­ciate pro­fessor of com­puter sci­ence and speech lan­guage pathology and audi­ology.

"What we're trying to do is improve the quality," she said, "but also increase the per­son­al­iza­tion of those voices, by not just making it a little kid's voice, but making it that little kid's voice."

Backed by a grant from the National Sci­ence Foun­da­tion, Patel and her research team are devel­oping ways to create per­son­al­ized syn­thetic voices that resemble users' vocal iden­ti­ties while remaining as under­stand­able as those of the healthy donors.

In the first iter­a­tion of the project, which Patel calls VocaliD (pro­nounced vocality, for Vocal Iden­tity), her team com­pu­ta­tion­ally merged the acoustics of a sus­tained vowel sound from a child with a speech dis­order, like this:

with the acoustics of a full sen­tence spoken by a healthy speaker of the same demo­graphic, like this:

The result is a clear, syn­thetic voice with the per­son­ality of the end user:

These voices have already elicited great responses from par­ents; one said, "If [my son] had been able to talk, this is what he would sound like." How­ever, the early ver­sion of VocaliD used a difficult-​​to-​​scale  approach that is not easily repro­ducible. Patel said, "We'd like to be able to allow users to create new voices as they mature in the same way a nat­ural voice evolves."

With the sup­port of another grant from the National Sci­ence Foun­da­tion, her team is cur­rently adding phys­i­o­log­ical infor­ma­tion on top of the acoustics.  "When you hear speech, it's a com­bi­na­tion of your source and your filter," Patel said. The source, she explained, derives from the voice box in the larynx whereas the filter is deter­mined by the shape and length of the vocal tract.

Vocal characteristics—such as pitch, breath­i­ness, and loudness—all emerge from the vocal folds in the larynx and give rise to vocal iden­tity. Mod­u­lating those fea­tures by changing the shape of our mouths and moving our tongues gives rise to dis­tinct vowel and con­so­nant sounds, which, Patel said, are typ­i­cally impaired in dis­or­dered speech.

Using data from a set of sen­sors placed on par­tic­i­pants' tongues and mouths, the researchers will deter­mine the most effi­cient way to approx­i­mate the phys­ical aspects of the dis­or­dered speaker's vocal tract. They can then add this infor­ma­tion into the voice-​​synthesis soft­ware to create voices that will grow and change as the users mature.

The aca­d­emic com­mu­nity has long accepted the source-​​filter theory of speech, but more work needs to be done in order to under­stand it, according to Patel, espe­cially as researchers develop more advanced speech tech­nolo­gies for secu­rity and other applications.

Patel's work in par­tic­ular also aims to inform basic research ques­tions such as, "How much do both the source and filter con­tribute to the iden­tity of a speaker's output?"

Patel's soft­ware is com­pat­ible across assis­tive tech­nology plat­forms, including main­stream touch-​​pad devices, a fea­ture she hopes will increase its adop­tion within the com­mu­nity. Patel spec­u­lates that assis­tive com­mu­ni­ca­tion devices will even­tu­ally appeal to healthy people as a new way of learning, com­mu­ni­cating, and interacting.

"The iPad rev­o­lu­tion is helping to break down bar­riers and increasing the emphasis on user inter­face issues," said Patel, who has been working to improve assis­tive com­mu­ni­ca­tion tech­nolo­gies for more than 16 years. "Lots of kids, both healthy and impaired, are using screens to interact now."

add to favorites email to friend print save as pdf

Related Stories

3Qs: Facial recognition is the new fingerprint

Sep 21, 2012

Ear­lier this month, the FBI began rolling out a $1 bil­lion update to the national fin­ger­printing data­base. Facial-​​recognition sys­tems, DNA analysis, voice iden­ti­fi­ca­tion and iris ...

A new kind of pub crawl

Aug 24, 2012

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these ...

Professor works toward a better brainwave monitor

Dec 06, 2012

The elec­trical out­puts of the brain con­tain mas­sive amounts of infor­ma­tion that could be a pow­erful resource if we could fully tap into it. Our brain processes things we see before any con­scious ...

Data mining in the social-media ecosystem

Sep 18, 2012

Ray­mond Fu, a newly appointed assis­tant pro­fessor of elec­trical and com­puter engi­neering, wants to build a better social-​​media ecosystem, one in which Face­book makes expert friend rec­om­men­da­tions ...

The secrets of spider silk

Feb 07, 2013

Each time a spider draws silk from its spin­neret to create a new web, it also draws on more than 400 mil­lion years of evo­lu­tion. Spi­ders have evolved to pro­duce a library of silks, each using ...

Recommended for you

New MCAT shifts focus, will include humanities

Oct 20, 2014

(HealthDay)—The Medical College Admission Test (MCAT) has been revised, and the latest changes, including more humanities such as social sciences, are due to be implemented next April, according to a report ...

User comments