Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, Co-editor of the Simply Statistics blog, and co-direct of the Johns Hopkins Data Science Specialization and Citizens.
My colleagues and I at Johns Hopkins Bloomberg School of Public Health are in favor of reproducible research, disclosing code, and open science—so a quote from my coworker Roger Peng at the close of a Nature piece, which described Mozilla’s efforts to implement code review for scientific papers, surprised people:
“One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code. We need to get more code
out there, not improve how it looks.”
His comment generated a lot of discussion and analysis within the community of folks affiliated with open science and reproducible research, but what Roger said was a bit more nuanced than what ran in Nature:
“I’m not sure. It’s possible that all we’ll learn is that scientists are not professional software engineers. But we already knew that. It’s also possible that the community will learn some important tips. But one worry I have is that with reviews like this, scientists will be even more discouraged from publishing their code, not because they think it won’t work, but because it doesn’t look professional or coded ‘efficiently.’ But we need to get more code out there, not improve how it looks. Ultimately, we need to focus on science.”
Open science and data people are surprised that sharing code would be anything but an obvious thing to do. To people who share code all the time, this is a no-brainer. My bias is clearly in that camp as well. I require reproducibility of my students analyses, I discuss reproducible research when I teach, I make my own analyses reproducible, and I frequently state in reviews that papers are only acceptable after the code is available.
What’s the Big Deal?
I recently had a paper come out in the journal Biostatistics that has been uh…a little controversial.
In this case, our paper was published with discussion. (For people outside of statistics, a discussant and a reviewer are different things. The paper first goes through peer review in the usual way. Then, once it is accepted for publication, it is sent out to discussants for comments.) A couple of discussants were very, very motivated to discredit our approach. Despite this, because we believe in open science, stating our assumptions, and being reproducible, we made all of the code we used and data we collected available for the discussants (and for everyone else). In an awesome win for open science, many of the discussants used and evaluated our code in their discussions.
One of the very motivated discussants identified an actual bug in the code. This bug caused the journal names to be scrambled in Figures 3 and 4. The bug (thank goodness!) did not substantively alter the methods, the results, or the conclusions of our paper. On top of it, the cool thing about having our code on GitHub meant we could carefully look it over, fix the bug, and push the changes to the repository (and update the paper) so the discussant could easily see the revised version.
We were happy that the discussant didn’t find any more substantial bugs (because we knew they were motivated to review our code for errors as carefully as possible). We were also happy to make the changes, admit our mistake, and move on.
An interesting thing happened though. The motivated discussant wanted to discredit our approach. So they included in the supplement how they noticed the bug (totally fair game, it was a bug). But they also included their email exchange with the editor about the bug and this quote:
“As all seasoned methodologists know, minor coding errors causing total havoc is quite common (I have seen it happen in my own work). I think that it is ironic that a paper that claims to prove the reliability of the literature had completely messed up the two main figures that represent the core of all its data and its main results.”
A couple of points here: (1) the minor bug didn’t wreak havoc with our results, change any conclusions, or affect our statistics and (2) the statement is clearly designed for the sole purpose of embarrassing us (the authors) and discrediting our work.
The problem here is that the code reviewer deeply cares about us being wrong. This incident highlights one reason for Roger’s concerns. I feel we acted in pretty good faith here to try to be honest about our assumptions and open with our code. We also responded quickly and thoroughly to the report of a bug. But the discussant used the fact that we had a bug at all to try to discredit our whole analysis with sarcasm. This sort of thing could absolutely discourage a person from releasing code.
The discussant is absolutely right that most code will have minor bugs. Personally, I’m very grateful to the discussant for catching the bug before the work was published and I’m happy that we made the code available and corrected our mistake.
But the key risk here is people who demand reproducible code only so they can try to embarrass analysts and discredit science they don’t like.
If we want people to make code available, be willing to admit mistakes, and continuously update their code then we don’t just need code review. We need a policy and commitment from the community to not just use reproducible research as a vehicle for embarrassment and discrediting each other. We need a policy that:
- Doesn’t discourage people from putting code up before papers are published for fear of embarrassment.
- Recognizes that minor bugs happen and doesn’t penalize people for acknowledging and fixing them.
- Prevents people from publishing when they have major typos, but doesn’t humiliate them.
- Defines specific, positive ways that code sharing can benefit the community (collaboration) rather than only reporting errors that are discovered when code is made available.
- Recognizes that most scientists are not professional software developers and focuses review on the scientific correctness/reproducibility of code, rather than technical software development skills.
As we encourage companies to share the data and the human subjects research they are conducting, it is important to develop a culture that maximizes the benefit of openness to all parties. Recently scientists working with Facebook performed and published an experiment focusing on the effect of changing the emotions in people’s feeds. This led to a huge outcry from the public and scientific community. It is important as we move forward that we encourage companies to publish their research and be open with their data, while maintaining protections for human subjects that have been carefully implemented in the scientific community.
It is important as we move forward that we encourage companies to publish their research and be open with their data while maintaining protections for human subjects that have been carefully implemented in the scientific community.
A version of this article originally appeared on Simply Statistics under the title: “How code review could discourage code disclosure? Reviewers with motivation.”