Critique of landmark study: Psychology may not face replicability crisis after all
A study published last year suggested psychological research was facing a replication crisis, but a new paper says that work was erroneous.
Shock waves reverberated through the field of psychology research last year at the suggestion that the field faced a "replicability crisis." But the research that triggered that quake is flawed, a team of psychologists asserted in a comment published Thursday in the journal Science.
The ability to repeat an experiment with the same results is a pillar of productive science. When the study that rocked the field was published in Science in late August, Nature News's Monya Baker wrote, "Don’t trust everything you read in the psychology literature. In fact, two thirds of it should probably be distrusted."
In what's called the Reproducibility Project, a large, international team of scientists had repeated 100 published experiments to see if they could get the same results. Only about 40 percent of the replicated experiments yielded the same results.
But now a different team of researchers is saying that there's simply no evidence of a replicability crisis in that study.
The replication paper "provides not a shred of evidence for a replication crisis," Daniel Gilbert, the first author of the new article in Science commenting on the paper from August, tells The Christian Science Monitor in a phone interview.
The initial study, conducted by the Open Science Collaboration, also openly shared all the resulting data sets. So Dr. Gilbert, a psychology professor at Harvard University, and three of his colleagues pored over that information in a quest to see if it held up.
And the reviewing team, none of whom had papers tested by the original study, found a few crucial errors that could have led to such dismal results.
Their gripes start with the way studies were selected to be replicated. As Gilbert explains, the 100 studies replicated were from just two disciplines of psychology, social and cognitive psychology, and were not randomly sampled. Instead, the team selected studies published in three prominent psychology journals and the studies had to meet a certain list of criteria, including how complex the methods were.
"Just from the start, in my opinion," Gilbert says, "They never had a chance of estimating the reproducibility of psychology because they do not have the sample of studies that represents psychology." But, he says, that error could be dismissed, as information could still arise about more focused aspects of the field.
But when it came down to replicating the studies, other errors were made. "You might naïvely think that the word replication, since it contains the word replica, means that these studies were done in exactly the same way as the original studies," Gilbert says. In fact, he points out, some of the studies were conducted using different methods or different sample populations.
"It doesn't stop there," Gilbert says. It turns out that the researchers made a mathematical error when calculating how many of the studies fail to replicate simply based on chance. Based on their erroneous calculations, the number of studies that failed to replicate far outnumbered those expected to fail by chance. But when that calculation was corrected, says Gilbert, their results could actually be explained by chance alone.
"Any one of [these mistakes] would cast grave doubt on this article," Gilbert says. "Together, in my view, they utterly eviscerate the conclusion that psychology doesn't replicate."
The journal Science isn't just leaving it at that though. Published alongside Gilbert and his team's critique of the original paper is a reply from 44 members of the replication team.
Brian Nosek, executive director of the Center for Open Science who led the original study, says that his team agrees with Gilbert's team in some ways.
Dr. Nosek tells the Monitor in a phone interview that his team wasn't trying to conclude why the original studies' results only matched the replicated results about 40 percent of the time. It could be that the original studies were wrong or the replications were wrong, either by chance or by inconsistent methods, he says.
Or perhaps there were conditions necessary to get the original result that the scientists didn't consider but could in fact further inform the results, he says.
"We don't have sufficient evidence to draw a conclusion of what combination of these contributed to the results that we observed," he says.
It could simply come down to how science works.
"No one study is definitive for anything, neither the replication nor the original," Nosek says. "Anyone that draws a definitive conclusion based on a single study is overstepping what science can provide," and that goes for the Reproducibility Project too. Each study was repeated only once, he says.
"What we offered is that initial piece of evidence that hopefully would, and has, gotten people's theoretical juices flowing, to spur that debate," Nosek says. And spur it has.
Gilbert agrees that one published scientific paper should not be taken as definitive. "Journals aren't gospel. Journals aren't the place where truth goes to be enshrined forever," he says. "Journals are organs of communication. They're the way that scientists tell each other, hey guys, I did an experiment. Look what I found."
When reproduction follows, that's "how science accumulates knowledge," Nosek says. "A scientific claim becomes credible by the ability to independently reproduce it."