Researcher gives subjects their voice

by Angela Herring
Credit: Rupal Patel.

Stephen Hawking and a 9-​​year-​​old girl with a speech dis­order most likely use the same syn­thetic voice. It's called Per­fect Paul and it's easy to under­stand, espe­cially in acousti­cally chaotic envi­ron­ments like class­rooms full of chil­dren. While new, more natural-​​sounding voices are avail­able, Per­fect Paul remains the most oft-​​used syn­thetic voice in the com­mu­nity of dis­or­dered speakers.

But Per­fect Paul con­veys none of the per­son­ality inherent in vocal iden­tity, explains Rupal Patel, an asso­ciate pro­fessor of com­puter sci­ence and speech lan­guage pathology and audi­ology.

"What we're trying to do is improve the quality," she said, "but also increase the per­son­al­iza­tion of those voices, by not just making it a little kid's voice, but making it that little kid's voice."

Backed by a grant from the National Sci­ence Foun­da­tion, Patel and her research team are devel­oping ways to create per­son­al­ized syn­thetic voices that resemble users' vocal iden­ti­ties while remaining as under­stand­able as those of the healthy donors.

In the first iter­a­tion of the project, which Patel calls VocaliD (pro­nounced vocality, for Vocal Iden­tity), her team com­pu­ta­tion­ally merged the acoustics of a sus­tained vowel sound from a child with a speech dis­order, like this:

with the acoustics of a full sen­tence spoken by a healthy speaker of the same demo­graphic, like this:

The result is a clear, syn­thetic voice with the per­son­ality of the end user:

These voices have already elicited great responses from par­ents; one said, "If [my son] had been able to talk, this is what he would sound like." How­ever, the early ver­sion of VocaliD used a difficult-​​to-​​scale  approach that is not easily repro­ducible. Patel said, "We'd like to be able to allow users to create new voices as they mature in the same way a nat­ural voice evolves."

With the sup­port of another grant from the National Sci­ence Foun­da­tion, her team is cur­rently adding phys­i­o­log­ical infor­ma­tion on top of the acoustics.  "When you hear speech, it's a com­bi­na­tion of your source and your filter," Patel said. The source, she explained, derives from the voice box in the larynx whereas the filter is deter­mined by the shape and length of the vocal tract.

Vocal characteristics—such as pitch, breath­i­ness, and loudness—all emerge from the vocal folds in the larynx and give rise to vocal iden­tity. Mod­u­lating those fea­tures by changing the shape of our mouths and moving our tongues gives rise to dis­tinct vowel and con­so­nant sounds, which, Patel said, are typ­i­cally impaired in dis­or­dered speech.

Using data from a set of sen­sors placed on par­tic­i­pants' tongues and mouths, the researchers will deter­mine the most effi­cient way to approx­i­mate the phys­ical aspects of the dis­or­dered speaker's vocal tract. They can then add this infor­ma­tion into the voice-​​synthesis soft­ware to create voices that will grow and change as the users mature.

The aca­d­emic com­mu­nity has long accepted the source-​​filter theory of speech, but more work needs to be done in order to under­stand it, according to Patel, espe­cially as researchers develop more advanced speech tech­nolo­gies for secu­rity and other applications.

Patel's work in par­tic­ular also aims to inform basic research ques­tions such as, "How much do both the source and filter con­tribute to the iden­tity of a speaker's output?"

Patel's soft­ware is com­pat­ible across assis­tive tech­nology plat­forms, including main­stream touch-​​pad devices, a fea­ture she hopes will increase its adop­tion within the com­mu­nity. Patel spec­u­lates that assis­tive com­mu­ni­ca­tion devices will even­tu­ally appeal to healthy people as a new way of learning, com­mu­ni­cating, and interacting.

"The iPad rev­o­lu­tion is helping to break down bar­riers and increasing the emphasis on user inter­face issues," said Patel, who has been working to improve assis­tive com­mu­ni­ca­tion tech­nolo­gies for more than 16 years. "Lots of kids, both healthy and impaired, are using screens to interact now."

Related Stories

3Qs: Facial recognition is the new fingerprint

date Sep 21, 2012

Ear­lier this month, the FBI began rolling out a $1 bil­lion update to the national fin­ger­printing data­base. Facial-​​recognition sys­tems, DNA analysis, voice iden­ti­fi­ca­tion and iris ...

A new kind of pub crawl

date Aug 24, 2012

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these ...

Professor works toward a better brainwave monitor

date Dec 06, 2012

The elec­trical out­puts of the brain con­tain mas­sive amounts of infor­ma­tion that could be a pow­erful resource if we could fully tap into it. Our brain processes things we see before any con­scious ...

Data mining in the social-media ecosystem

date Sep 18, 2012

Ray­mond Fu, a newly appointed assis­tant pro­fessor of elec­trical and com­puter engi­neering, wants to build a better social-​​media ecosystem, one in which Face­book makes expert friend rec­om­men­da­tions ...

The secrets of spider silk

date Feb 07, 2013

Each time a spider draws silk from its spin­neret to create a new web, it also draws on more than 400 mil­lion years of evo­lu­tion. Spi­ders have evolved to pro­duce a library of silks, each using ...

Recommended for you

Cardinal Health paying $26.8 million in FTC settlement

date Apr 20, 2015

Cardinal Health will pay $26.8 million as part of a settlement with the Federal Trade Commission over charges it monopolized the sale in 25 markets of diagnostic drugs known as low-energy radiopharmaceuticals.

User comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.