Researchers discover widespread flaw in global childhood development data
It was a pattern that many researchers had noticed—children in developing countries born at certain times of the year are taller for their age than others. William Masters, a professor at the Friedman School, wanted to know more, specifically how stress in utero affects children later in life. What he and his colleagues found instead was a widespread flaw in the data that researchers have relied on for two decades to understand childhood malnutrition.
Many of the surveys used for collecting information on children's growth in developing countries, they found, have more error in children's birth months than their birth years. As a result, observed differences in height by season of birth may not be due to climate-related fluctuations in nutrition or infections, as previously assumed, but to the fact that children classified as born early in the calendar year are actually younger than reported, while those reported to be born late in the year are in fact older.
Correctly classifying children's age in months is important for the accuracy of a key child health indicator, the height-for-age score. The score, a standard measure comparing a child's height to that of healthy peers, is commonly used to indicate malnutrition and allocate global aid.
Then they devised an algorithm to fix the problem. "What we found suggests there may be significant measurement error associated with much of the literature regarding factors that influence childhood development," Masters said. "Correcting for measurement error can help us see country-specific patterns and more clearly identify areas for focusing our efforts to help children."
Tufts Now recently spoke with Masters about the discovery and the correction formula that he and his colleagues have devised. The findings were recently published in separate studies in Demography and the American Journal of Clinical Nutrition.
Tufts Now: How did you go about unraveling this phenomenon?
William Masters: We came across this finding accidentally while trying to learn more about a pattern that researchers have observed from survey data in the developing world, that children born at certain times of the year are taller for their age than others. This seasonality in attained height was thought to be caused by seasonal variations in climate, employment, and agriculture affecting the child in utero and early infancy.
My own and others' previous research on birth season differences focused on individual countries in Africa and South Asia. We wanted to identify how richer countries are able to protect infants from seasonality in the weather, so we took height-for-age scores and reported month of birth for nearly one million children from sixty-two different countries and plotted them on a graph. The United States Agency for International Development paid for these data, so they're in the public domain.
Was there an "aha" moment when your realized what was happening?
The puzzle jumped out at us right away. In the global data, with so many observations, we saw a straight upward line showing children born later in each calendar year to be taller. Despite different climates around the world, we saw children born in January to be the smallest, with slight increases in height for age among those born in February, March, and each successive month with the tallest children being those reported as born in December. The graph looked like a sawtooth, rising smoothly from month to month, followed by an abrupt drop in attained height from December to January.
What we saw in the global data could not be true seasonality. In previous studies with smaller sample sizes, the fluctuations by month of birth were less smooth and could be due to real fluctuations in harvests and the disease environment. This global sawtooth pattern was surely the result of measurement error, but could not be explained by any of the known problems in survey data such as heaping, in which people report age in round numbers or record births around memorable dates, or discontinuity caused by switching between measuring infants' length lying down to measuring older children's height when they're standing up.
The first aha moment was realizing that the sawtooth pattern was an artifact of measurement, not real seasonality. We then stared at the data for a long time, trying different explanations. In the end, the simplest theory worked best, reproducing the sawtooth when some of the surveyed children have randomly scrambled birth months within their birth year.
How is it possible that there was misreporting of age for so many children?
We take birthdays for granted because we use our date of birth so often, for identification and celebration. But about a third of the world's children have no birth certificate, and in many low-income families, people don't record or use the exact date for anything. In other kinds of survey data, interviewers record unknown information as missing, but since age in months is needed to calculate many things about child welfare, interviewers are taught to help parents remember—and encouraged to record an approximate month even if the parents are uncertain. Of the million children in our dataset, we estimate that about one in ten had a recorded birth month that was effectively random.
In your second study on this phenomenon, you and your colleagues came up with a mathematical formula to correct the error. What were the results of that effort?
We cannot find the true birthday, but we can eliminate the false seasonality created by randomness in how interviewers record month of birth. What we found is that season of birth still matters in many countries, especially in sub-Saharan Africa. Correcting for measurement error can help us see country-specific patterns and more clearly identify areas for focusing our efforts to help children.
What do your findings mean for all the existing research that might be based on age data that was inaccurate?
Most of the time, randomness in birthdays just adds noise. That helps explain why so many studies find weird results, including findings of no significant effect for interventions that really do help children. On average there's no systematic bias, except for studies that link season of birth to attained height. Several influential papers about that turn out to be wrong.
Using our algorithm to detect errors, we have already published some corrected estimates of seasonality in attained height and are continuing to work on the issue. We hope that the algorithm can be baked into future demographic surveys to identify which groups are most often misreporting. Improving birth records is important not just for research, but for a family's own decisions about their child, such as vaccination and school enrollment. Recording births might also be culturally important, to help people celebrate each child's life by recognizing their birthday.
What are the potential effects of these two studies in the developing world?
Fixing this flaw in survey data can help us see otherwise hidden impacts of children's birth circumstances on their health and well-being. We can target aid more accurately and measure the improvements in child welfare associated with our policies and programs. We can also help countries improve their birth records and data collection tools, so that individuals and groups can help themselves.
Our work on birthdates is also a great example of big data in action. The problem went unnoticed for decades, until we combined enough different surveys to see a phenomenon that could not otherwise be detected. That gives me confidence that we can continue improving children's lives—if we keep looking carefully at the data, and acting on what we see.