Applying information theory to linguistics suggests 'functional design' in cross-language variations

October 10, 2012 by Larry Hardesty

The majority of languages—roughly 85 percent of them—can be sorted into two categories: those, like English, in which the basic sentence form is subject-verb-object ("the girl kicks the ball"), and those, like Japanese, in which the basic sentence form is subject-object-verb ("the girl the ball kicks").

The reason for the difference has remained somewhat mysterious, but researchers from MIT's Department of Brain and Cognitive Sciences now believe that they can account for it using concepts borrowed from , the discipline, invented almost singlehandedly by longtime MIT professor Claude Shannon, that led to the in communications. The researchers will present their hypothesis in an upcoming issue of the journal .

Shannon was largely concerned with faithful communication in the presence of "noise"—any external influence that can corrupt a message on its way from sender to receiver. Ted Gibson, a professor of cognitive sciences at MIT and corresponding author on the new paper, argues that is an example of what Shannon called a "noisy channel."

"If I'm getting an idea across to you, there's noise in what I'm saying," Gibson says. "I may not say what I mean—I pick up the wrong word, or whatever. Even if I say something right, you may hear the wrong thing. And then there's ambient stuff in between on the signal, which can screw us up. It's a real problem." In their paper, the MIT researchers argue that languages develop the word order rules they do in order to minimize the risk of across a noisy channel.

Gibson is joined on the paper by Rebecca Saxe, an associate professor of ; Steven Piantadosi, a postdoc at the University of Rochester who did his doctoral work with Gibson; Leon Bergen, a graduate student in Gibson's group; research affiliate Eunice Lim; and Kimberly Brink, who graduated from MIT in 2010.

Mixed signals

The researchers' hypothesis was born of an attempt to explain the peculiar results of an experiment reported in the Proceedings of the National Academy of Sciences in 2008; Brink reproduced the experiment as a class project for a course taught by Saxe. In the experiment, native English speakers were shown crude digital animations of simple events and asked to describe them using only gestures. Oddly, when presented with events in which a human acts on an inanimate object, such as a girl kicking a ball, volunteers usually attempted to convey the object of the sentence before trying to convey the verb—even though, in English, verbs generally precede objects. With events in which a human acts on another human, such as a girl kicking a boy, however, the volunteers would generally mime the verb before the object.

"It's not subtle at all," Gibson says. "It's about 70 percent each way, so it's a shift of about 40 percent."

The tendency even of speakers of a subject-verb-object (SVO) language like English to gesture subject-object-verb (SOV), Gibson says, may be an example of an innate human preference for linguistically recapitulating old information before introducing new information. The "old before new" theory—which, according to the University of Pennsylvania linguist Ellen Price, is also known as the given-new, known-new, and presupposition-focus theory—has a rich history in the linguistic literature, dating back to at least the work of the German philosopher Hermann Paul, in 1880.

Imagine, for instance, the circumstances in which someone would actually say, in ordinary conversation, "the girl kicked the ball." Chances are, the speaker would already have introduced both the girl and the ball—say, in telling a story about a soccer game. The sole new piece of information would be the fact of the kick.

Assuming a natural preference for the SOV word order, then—at least in cases where the verb is the new piece of information—why would the volunteers in the PNAS experiments mime SVO when both the subject and the object were people? The MIT researchers' explanation is that the SVO ordering has a better chance of preserving information if the communications channel is noisy.

Suppose that the sentence is "the girl kicked the boy," and that one of the nouns in the sentence—either the subject or the object—will be lost in transmission. If the word order is SOV, then the listener will receive one of two messages: either "the girl kicked" or "the boy kicked." If the word order is SVO, however, the two possible messages on the receiving end are "the girl kicked" and "kicked the boy": More information will have made it through the noisy channel.

Down to cases

That is the MIT researchers' explanation for the experimental findings reported in the 2008 PNAS paper. But how about the differences in word order across languages? A preliminary investigation, Gibson says, suggests that there is a very strong correlation between word order and the strength of a language's "case markings." Case marking means that words change depending on their syntactic function: In English, for instance, the pronoun "she" changes to "her" if the kicker becomes the kicked. But case marking is rare in English, and English is an SVO language. Japanese, a strongly case-marked language, is SOV. That is, in Japanese, there are other cues as to which noun is subject and which is object, so Japanese speakers can default to their natural preference for old before new.

Gibson adds that, in fact, some languages have case markings only for animate objects—an observation that accords particularly well with the MIT researchers' theory.

"It's an extremely valuable study," says Steven Pinker, the Johnstone Family Professor in the Department of Psychology at Harvard University. "The design of any language reflects a compromise between properties that make it more useful—clarity, expressiveness, ease of articulation—and properties that are standardized across a community of speakers so that everyone is using the same code. Most grammatical theorists have focused on the arbitrary nature of the community-wide grammar. Gibson has now shed light on how each of these grammars has evolved, in a few predictable ways, to maximize clarity in communicating who did what to whom. That is, much more can be said than just 'That's the way English is; that's the way Turkish is,' and so on. Gibson's study shows that there is a great deal of functional design in seemingly arbitrary patterns of variation across languages."

In order to make their information-theoretical model of more rigorous, Gibson says, he and his colleagues need to better characterize the "noise characteristics" of spoken conversation—what types of errors typically arise, and how frequent they are. That's the topic of ongoing experiments, in which the researchers gauge people's interpretations of sentences in which words have been deleted or inserted.

Explore further: Historical context guides language development

Related Stories

Historical context guides language development

April 14, 2011

Not only do we humans enjoy talking -- and talking a lot -- we also do so in very different ways: about 6,000 languages are spoken today worldwide. How this wealth of expression developed, however, largely remains a mystery. ...

Study: Word sounds contain clues for language learners

September 13, 2011

( -- Why do words sound the way they do? For over a century, it has been a central tenet of linguistic theory that there is a completely arbitrary relationship between how a word sounds and what it means.

Recommended for you

Study links cannabis use in adolescence to schizophrenia

April 26, 2017

Scientists believe that schizophrenia, a disorder caused by an imbalance in the brain's chemical reactions, is triggered by a genetic interaction with environmental factors. A new Tel Aviv University study published in Human ...

Cognitive skills differ across cultures and generations

April 25, 2017

An innovative study of children and parents in both Hong Kong and the United Kingdom, led by University of Cambridge researchers Michelle R. Ellefson and Claire Hughes, reveals cultural differences in important cognitive ...

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Oct 10, 2012
Music is a language.
You can almost conjuncture the dawn of man was susceptible(impressionable) to 'noise' - wind, water, birds and the rest.
In music listeners can not detect 'error'. - if the players are allowed to associate freely to the sound they are creating.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.