When it comes to replicating studies, context matters
Contextual factors, such as the race of participants in an experiment or the geography of where the experiment was run, can reduce the likelihood of replicating psychological studies, a team of New York University researchers has found. Their work, which appears in the journal Proceedings of the National Academy of Sciences (PNAS), analyzed papers examined by the Reproducibility Project in an effort to identify potential challenges to replicating scientific scholarship.
"The scientific community is continually evaluating how it can optimize its research process and should remain open to new practices to improve scholarship," observes Jay Van Bavel, an associate professor in NYU's Department of Psychology and the study's lead author. "These new findings suggest that we will need to improve both our methods and our theory if we want to improve reproducibility in science and we propose a roadmap for enhancing scientific research: scientists should avoid making universal generalizations based on limited data, explicitly define contextual factors that may influence their results, and work closely with original researchers to enhance reproducibility."
Last year, the Reproducibility Project, a collaborative of psychology researchers, sought to replicate the findings of 100 previously published psychology studies. However, it was able to do so with only 39 percent of these studies, raising questions about the validity of the original scholarship. In March, a group of psychology researchers from Harvard University and the University of Virginia published a critique in Science, raising doubts about the Reproducibility Project's findings. They concluded that its analysis was statistically flawed and that several replication studies were poorly designed.
In the new PNAS paper, the NYU researchers took a different approach—they focused on the nature of the research topic in the original studies. They re-analyzed all 100 papers that the Reproducibility Project sought to replicate, including some co-authored by other NYU faculty.
Specifically, they assessed the extent to which the effects reported in the original studies were likely to be influenced by contextual factors such as time (e.g., pre- vs. post-Recession), culture (e.g., Eastern vs. Western culture), location (e.g., rural vs. urban setting), or population (e.g., a racially diverse population vs. a predominantly white population). In other words, they appraised the contextual sensitivity of the topics in the original 100 studies. The coders were blind to the results of the Reproducibility Project's replication attempts for all the papers they coded.
The researchers then examined the relationship between ratings of contextual sensitivity (i.e., how likely context would affect the chances of replicating a given study) with the findings from the Reproducibility Project.
The results showed that context ratings predicted replication success even after statistically adjusting for methodological factors such as effect size and statistical power. Specifically, studies with higher contextual sensitivity ratings—where, for instance, altering the race or geographical location of study participants could alter the results—were less likely to be reproduced by the Reproducibility Project researchers.
In a second analysis, the NYU researchers examined which of the 100 replication studies were endorsed by the original authors—prior to the Reproducibility Project's data collection. Here they found that replication studies, which were not endorsed by the original authors, were far less likely to reproduce the results.
Van Bavel and his colleagues note that challenges facing replication are not limited to psychology—and stretch back hundreds of years. For example, Sir Isaac Newton alleged that his contemporaries were unable to replicate his research on the color spectrum of light due to bad prisms. After he was able to direct them to better prisms (ones produced in London, rather than Italy) they were able to reproduce his results. In modern times, studies using mice or rats may be hampered by subtle environmental differences, such as food, bedding, and light, which can affect biological and chemical processes that determine whether experimental treatments succeed or fail.