Putting a voice and face together in early infancy determines later language development
Matching the sight and sound of speech—a face to a voice—in early infancy is an important foundation for later language development.
This ability, known as intersensory processing, is an essential pathway to learning new words. According to a recent study published in the journal Infancy, the degree of success at intersensory processing at only 6 months old can predict vocabulary and language outcomes at 18 months, 2 years and 3 years old.
"Adults are highly skilled at this, but infants must learn to relate what they see with what they hear. It's a tremendous job and they do it very early in their development," said lead author Elizabeth V. Edgar, who conducted the study as an FIU psychology doctoral student and is now a postdoctoral fellow at the Yale Child Study Center. "Our findings show that intersensory processing has its own independent contribution to language, over and above other established predictors, including parent language input and socioeconomic status."
Across three years, Edgar and a team at FIU psychology professor Lorraine E. Bahrick's Infant Development Lab tested intersensory processing speed and accuracy in 103 infants between the ages of 3 months and 3 years old, using the Intersensory Processing Efficiency Protocol (IPEP). This tool was created by Bahrick and co-investigator FIU Research Assistant Professor of Psychology James Torrence Todd and colleagues.
Designed to present distraction or simulate the "noisiness" of picking out a speaker from a crowd, the IPEP presents several short video trials. Each trial depicts six faces of women displayed in separate boxes on the screen at once. All the women appear to be speaking.
However, the soundtrack that matches only one of the women speaking is heard on each trial. With an eye tracker that follows pupil movement, the researchers could measure whether the babies made the match, as well as how long they watched the matching face and voice.
Then, the data was compared with language outcomes at different stages of development—such as how many unique and total words children used. Results revealed infants who looked longer at the correct speaker were later found to have better language outcomes at 18 months, 2 years and 3 years old.
The connection between intersensory processing and language becomes clearer when considering the nature of speech. It's a sound, of course. But it's also accompanied by lip movements, facial expressions and gestures. Speaking is both auditory and visual. Baby talk, in particular, is a true multisensory experience. A parent or caregiver gestures playfully, perhaps moving around a favorite toy while naming it. This sets the stage for learning, understanding what word corresponds to specific objects in the world—something that can only happen once a baby can be more selective with their attention, cutting through distractions to match a voice to a face or a sound to an object.
"Better selective attention to audiovisual speech in infancy may allow infants to take greater advantage of early word learning opportunities, such as object labeling, provided by caregivers during interactions," Bahrick said.
For parents or caretakers, Edgar pointed out this research serves as a reminder that babies rely on coordinating what they see with what they hear to learn language.
"That means it is helpful to gesture toward what you're talking about or move an object around while saying its name. It's the object-sound synchrony that helps show that this word belongs with this thing," Edgar explained. "As we're seeing in our studies, this is very important in early development and lays the groundwork for more complex language skills later on."
More information: Elizabeth V. Edgar et al, Intersensory processing of faces and voices at 6 months predicts language outcomes at 18, 24, and 36 months of age, Infancy (2023). DOI: 10.1111/infa.12533