Why context matters in the long and short of words: Researchers improve 75-year-old language theory

June 20, 2011 By Bobbie Mixon in Psychology & Psychiatry
Why context matters in the long and short of words

Enlarge

In 1935, Harvard University linguist George Kingsley Zipf asserted that "the magnitude of words tends, on the whole, to stand in an inverse, not necessarily proportionate, relationship to the number of occurrences." In other words, short words are used more frequently than long ones. Now, cognitive scientists at the Massachusetts Institute of Technology demonstrated a substantial improvement to Zipf's law. Credit: Adrian Apodaca, National Science Foundation

(Medical Xpress) -- Do you ever wonder about the stuff that makes up words? Why is a word a word, what goes into forming it, what's its history or why is it long or short? Scientists at the Massachusetts Institute of Technology do.

Steven Piantadosi, Harry Tily and Edward Gibson study for MIT's Department of Brain and Cognitive Sciences to understand how humans think and communicate.

Recently, they put a well-established, 75-year-old language theory to the test and found it had room for improvement. At issue was something called Zipf's law, an empirical scientific principle that says word length is primarily determined by frequency of use.

In 1935, Harvard University George Kingsley Zipf asserted "the magnitude of words tends, on the whole, to stand in an inverse, not necessarily proportionate, relationship to the number of occurrences." In other words, short words are used more than long ones.

"One widely known and apparently universal property of is that frequent words tend to be short," the researchers write in their report. They note short words are used to make communication more efficient than what can be had with frequent use of longer words.

This is because of pressure for communicative efficiency, Zipf surmised. It would be impractical to ask everyone at a Thanksgiving dinner whether they would like a bowl of soup using a 15-letter word for "of," for example.

In the Brown University Standard Corpus of Present-Day American English, which contains about two million words of text, "of" is the fourth most commonly used word. Meanwhile, "the" is used more in writing than any other word in the English language. In fact, a list of the top 100 most frequently used words contains words such as "be," "on," "have," "with," "who," and "some," all very short words.

But the cognitive scientists at MIT demonstrated a substantial improvement to Zipf's law. They showed that across 10 languages the predictability of what a person says is a more important determinant of word length than how often he or she says it.

Word length actually comes down to the amount of information it contains

The goal of the research was to compare Zipf's word frequency theory to Piantadosi and colleagues' word predictability theory--the idea that the average amount of information a word conveys in context--its predictability--determines word length.

Using an Internet database, the researchers studied how often all possible sequences of two, three or four word combinations occur together in order to estimate how predictable any word is when it's typically written. 

By knowing this, they could determine whether context and predictability were better determinants of word length than frequency of use.

"For instance, in a context like ‘Monday night ____' the word ‘football' is very predictable and therefore conveys very little information," said Piantadosi, a cognitive scientist in the Ph.D. program at MIT and lead author of the study. "But, in a context like ‘I ate ____,' the missing word is very unpredictable, but conveys a lot of information."

The hypothesis was that average information contained in two, three or four word sequences should in part determine the length of words, either in letters or syllables, since that's how an optimal code would behave. In this example, "football" and the two words preceding it demonstrated the effect.

"The only way these effects can get in to the lexicon is if our linguistic systems, and the mechanisms of language change, are sensitive to communicative pressures," said Piantadosi.

The sequences of words that people use are coded--their letters, syllables, sounds, etc.--for efficient communication and are better predictors of word length than frequency alone, he said.

"This means word sequences provide efficient codes for the meanings they convey, relative to the statistical regularities in language," he said. "That's our claim."

Context matters for love, amour, liebe, amor and kärlek,

Love, amour, liebe, amor and kärlek all mean the same thing across different languages and all are about the same length, which according to Zipf is what should be expected if they were similarly predictable or informative.  But the MIT researchers stress it's the words before and after a particular word that determines how often the particular word is used, not length.

True, the word for strong fondness is very short, but how frequently do people say it, what are the circumstances when they do and how predictable is the information conveyed when it's said? Saying "I love you" is quite different from saying "I love chicken." For a word like "love," context matters.

The research results held across all but one of the languages studied: Czech, Dutch, English, French, German, Italian, Portuguese, Romanian, Spanish and Swedish, with German being the outlier.

"I was surprised that we found effects in so many languages," said Piantadosi. "I would have thought that differences in morphology, or word structure, might have swamped our effects in many languages, but this doesn't appear to be the case."

Why the most frequently used words are short

The research findings also provide an improved explanation as to why the most often used words are short--because they tend to be predictable, meaning many short words, on average, convey relatively little information. Of the top 100 words, many are "function words," whose main purpose is to join words together such as--"with," "from" and "over." By themselves, these words give the reader or listener a very small amount of data.

The researchers also found short words must be paired with other familiar words to derive context and convey information. This is because many times words occurring after well-known sequences of other words are the most predictable and contain the least information; for example "a ton of fun," is a well known sequence of words that conveys very little information. But words that have a little association to the words preceding them contain more information; for example, "a ton of butter."

A final word

The research revealed that people communicate through at least an approximately optimal code for meaning, said Piantadosi. "Lexicons are not arbitrary in the sense of being completely random. Instead, they are well-structured for communication, given the patterns of word sequences people typically use."

The problem with the traditional method of only looking at word frequency is that it merely involves counting words in isolation and does not consider the regular dependencies between words.

The research is published in the Proceeding of the National Academy of Sciences in an article titled "Word lengths are optimized for efficient communication." The National Science Foundation's Division of Behavioral and Cognitive Sciences funds the research.

More information: Word lengths are optimized for efficient communication: http://www.pnas.or … 108.abstract

Provided by National Science Foundation search and more info website

4.2 /5 (6 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

Isaacsname
Jun 20, 2011

Rank: not rated yet
Can they make heads or tails out of ebonics ?
hush1
Jun 20, 2011

Rank: not rated yet
Forget Zipf.

"This means word sequences provide efficient codes for the meanings they convey, relative to the statistical regularities in language," he said. "That's our claim." - The Authors

This claim fails to explain music.
The origin of words originates with sound.
The origin of meaning(to sound)originates with association.
First sound, then meaning.

Maybe, just maybe, we will get around to assigning and associating sounds first, and other perceptions second to markings on a cave wall. Our walls are big and small now.
Libraries and electrical storage.

You are insecure. Defending theory, until the next best theory comes along.
RobertKarlStonjek
Jun 22, 2011

Rank: not rated yet
In fact, a list of the top 100 most frequently used words contains words such as "be," "on," "have," "with," "who," and "some," all very short words.

Could this explain the length of the word 'sex'?
hush1
Jun 22, 2011

Rank: not rated yet
lol Typical. English. How revealing.
I need 18 letters to say the same thing.
Geschlechtsverkehr.
alfie_null
Jun 23, 2011

Rank: not rated yet

The origin of words originates with sound.
The origin of meaning(to sound)originates with association.
First sound, then meaning.


imho, no.
e.g. lol, aka, fyi, even abbr.

bfn ;-)
hush1
Jun 23, 2011

Rank: not rated yet
alfie null:
"e.g. lol, aka, fyi, even abbr."
You don't 'sound'(speak) these short hands.
The long hand of these short hands is sound.
First long hand, then short hands.
Rank 4.2 /5 (6 votes)
Related Stories
Relevant PhysicsForums posts
  • A question about drug tolerance
    created22 hours ago
  • Poor nutrition leading to overeating?
    createdMay 23, 2012
  • Math and dyslexia?
    createdMay 21, 2012
  • portable metabolism meter?
    createdMay 21, 2012
  • Rare medical conditions on 20/20 tonight
    createdMay 18, 2012
  • "Good" Cholesterol in Doubt
    createdMay 17, 2012
  • More from Physics Forums - Medical Sciences

More news stories

Feeling strong emotions makes peoples' brains 'tick together'

Experiencing strong emotions synchronises brain activity across individuals, research team at Aalto University and Turku PET Centre in Finland has revealed.

Psychology & Psychiatry created 5 hours ago | popularity not rated yet | comments 0

Formal recognition of PMDD will lift stigma for women

A decision to recognise premenstrual dysphoric disorder as a genuine psychiatric condition will finally provide “validation for this awful and poorly understood” syndrome and alleviate the stigma ...

Psychology & Psychiatry created 8 hours ago | popularity 2 / 5 (1) | comments 0

Long-term meditation leads to different brain organization

(Medical Xpress) -- People who practice mindfulness meditation learn to accept their feelings, emotions, and states of mind without judging or resisting them. They simply live in the moment.

Psychology & Psychiatry created 8 hours ago | popularity 5 / 5 (1) | comments 0 | with audio podcast

Older African-Americans use religious songs to cope with stress, study shows

(Medical Xpress) -- New research from the University of North Carolina Chapel Hill School of Nursing has shown that older African-Americans use religious songs in a personal way to cope with stressful life events. Songs long ...

Psychology & Psychiatry created 9 hours ago | popularity not rated yet | comments 0

Spatial configuration can spark deja vu, psychology study reveals

(Medical Xpress) -- Déjà vu - that strange feeling of having experienced something before - is more likely to occur when a scene's spatial layout resembles one in memory, according to groundbreaking new research ...

Psychology & Psychiatry created 10 hours ago | popularity not rated yet | comments 0 | with audio podcast


Cyber exercise partners help you go the distance: Motivation gains can double

A new study testing the benefits of a virtual exercise partner shows the presence of a moderately more capable cycling partner can significantly boost the motivation – by as much as 100 percent – ...

Childhood cancer scars survivors later in life

Scars left behind by childhood cancer treatments are more than skin-deep. The increased risk of disfigurement and persistent hair loss caused by childhood cancer and treatment are associated with emotional distress and reduced ...

Report: State tobacco prevention funding lacking

(AP) -- States have spent only about 3 percent of the billions they've received in tobacco taxes and legal settlements over the last decade to fund tobacco prevention programs, making it harder to reduce the death and disease ...

Low vitamin D in diet increases stroke risk in Japanese-Americans

Japanese-American men who did not eat foods rich in vitamin D had a higher risk of stroke later in life, according to results of a 34-year study reported in Stroke, an American Heart Association journal.

Doctors group warns EU health care access shrinking

Access to health care is declining in Europe, and Greece in particular faces a humanitarian crisis as it cuts health and social spending, aid group Doctors of the World warned Thursday.

Scotland sets minimum price for booze

Scotland on Thursday became the first part of Britain to introduce a minimum price for alcohol in an attempt to change its unhealthy relationship with booze.