I was recently asked to define what Open Science means. It would have been relatively easy to fall back on a litany of “Open Source, Open Data, Open Access, Open Notebook”, but these are just shorthand for four fundamental goals:
- Transparency in experimental methodology, observation, and collection of data.
- Public availability and reusability of scientific data.
- Public accessibility and transparency of scientific communication.
- Using web-based tools to facilitate scientific collaboration.
The idea I’ve been most involved with is the first one, since granting access to source code is really equivalent to publishing your methodology when the kind of science you do involves numerical experiments. I’m an extremist on this point, because without access to the source for the programs we use, we rely on faith in the coding abilities of other people to carry out our numerical experiments. In some extreme cases (i.e. when simulation codes or parameter files are proprietary or are hidden by their owners), numerical experimentation isn’t even science. A “secret” experimental design doesn’t give skeptics the ability to repeat (and hopefully verify) your experiment, and the same is true with numerical experiments. Science has to be “verifiable in practice” as well as “verifiable in principle”.
In general, we’re moving towards an era of greater transparency in all of these topics (methodology, data, communication, and collaboration). The problems we face in gaining widespread support for Open Science are really about incentives and sustainability. How can we design or modify the scientific reward systems to make these four activities the natural state of affairs for scientists? Right now, there are some clear disincentives to participating in these activities. Scientists are people, and we’re motivated by most of the same things as normal people:
- Money, for ourselves, for our groups, and to support our science.
- Reputation, which is usually (but not necessarily) measured by citations, h-indices, download counts, placement of students, etc.
- Sufficient time, space, and resources to think and do our research (which is, in many ways, the most powerful motivator).
Right now, the incentive network that scientists work under seems to favor “closed” science. Scientific productivity is measured by the number of papers in traditional journals with high impact factors, and the importance of a scientists work is measured by citation count. Both of these measures help determine funding and promotions at most institutions, and doing open science is either neutral or damaging by these measures. Time spent cleaning up code for release, or setting up a microscopy image database, or writing a blog is time spent away from writing a proposal or paper. The “open” parts of doing science just aren’t part of the incentive structure.
Michael Faraday’s advice to his junior colleague to: “Work. Finish. Publish.” needs to be revised. It shouldn’t be enough to publish a paper anymore. If we want open science to flourish, we should raise our expectations to: “Work. Finish. Publish. Release.” That is, your research shouldn’t be considered complete until the data and meta-data is put up on the web for other people to use, until the code is documented and released, and until the comments start coming in to your blog post announcing the paper. If our general expectations of what it means to complete a project are raised to this level, the scientific community will start doing these activities as a matter of course.
If you meet a scientist who tells you that they did a fantastic experiment and have wonderful data, you naturally ask them to email you a reprint. Any working scientist would be perplexed if the response was: “Oh, I’m not going to be writing this work up for publication.” It would be absolute nonsense in the culture of science to not publish a report in a journal on the work you have done. And yet, no one seems surprised when scientists are too busy or too secretive to release their data to the community. We should be just as perplexed by this. Instead of complaining about the reward and incentive systems, we should be setting the standard higher: “What do you mean that you haven’t got around to putting your data on the web? You aren’t done yet!” Or: “How can I possibly review this paper if I can’t see the code they were using? There’s now way for me to tell if they did the calculation right.” We’re going to have to raise the expectations on completing a scientific project if we want to change the culture of science.
Pingback: ¿Qué es y cuál es la importancia de la ciencia abierta?
Pingback: What I learned in CMN2570: The importance of collective knowledge in today’s data-centric society (4875 characters) | nicolelblog
Pingback: Open Science: What the Fuss is About – UC3 Portal
Pingback: Corona: Seven Ways to Smash the Curve Now | Hacker Noon - Coiner Blog
Pingback: Corona: Seven Ways to Smash the Curve Now
A very good (and quick) introduction to open science and why it should be come the norm. I recommend this article to all those students who want to pursue research in any area of science and scholarship. Teachers of science may use this in their undergraduate classes so students would get an idea of the advantages of sharing.