If you ask a scientist what makes a good experiment, you’ll get very specific answers about reproducibility and controls and methods of teasing out causal relationships between variables and observables. If human observations are involved, you may get detailed descriptions of blind and double-blind experimental designs. In contrast, if you ask the very same scientists what makes a theory or explanation scientific, you’ll often get a vague statement about falsifiability. Scientists are usually very good at designing experiments to test theories. We invent theoretical entities and explanations all the time, but very rarely are they stated in ways that are falsifiable. It is also quite rare for anything in science to be stated in the form of a deductive argument. Experiments often aren’t done to falsify theories, but to provide the weight of repeated and varied observations in support of those same theories. Sometimes we’ll even use the words verify or confirm when talking about the results of an experiment. What’s going on? Is falsifiability the standard? Or something else?
The difference between falsifiability and verifiability in science deserves a bit of elaboration. It is not always obvious (even to scientists) what principles they are using to evaluate scientific theories, 1 so we’ll start a discussion of this difference by thinking about Popper’s asymmetry. 2 Consider a scientific theory (T) that predicts an observation (O). There are two ways we could approach adding the weight of experiment to a particular theory. We could attempt to falsify or verify the observation. Only one of these approaches (falsification) is deductively valid:
If T, then O
Not-OIf T, then O
Deductively Valid Deductively Invalid
Popper concluded that it is impossible to know that a theory is true based on observations (O); science can tell us only that the theory is false (or that it has yet to be refuted). He concluded that meaningful scientific statements are falsifiable.
A more realistic picture of scientific theories isn’t this simple. We often base our theories on a set of auxiliary assumptions which we take as postulates for our theories. For example, a theory for liquid dynamics might depend on the whole of classical mechanics being taken as a postulate, or a theory of viral genetics might depend on the Hardy-Weinberg equilibrium. In these cases, classical mechanics (or the Hardy-Wienberg equilibrium) are the auxiliary assumptions for our specific theories.
These auxiliary assumptions can help show that science is often not a deductively valid exercise. The Quine-Duhem thesis 3 recovers the symmetry between falsification and verification when we take into account the role of the auxiliary assumptions (AA) of the theory (T):
If (T and AA), then O
Not-OIf (T and AA), then O
Deductively Invalid Deductively Invalid
That is, if the predicted observation (O) turns out to be false, we can deduce only that something is wrong with the conjunction, (T and AA); we cannot determine from the premises that it is T rather than AA that is false. In order to recover the asymmetry, we would need our assumptions (AA) to be independently verifiable:
If (T and AA), then O
Not-OIf (T and AA), then O
Deductively Valid Deductively Invalid
Falsifying a theory requires that auxiliary assumption (AA) be demonstrably true. Auxiliary assumptions are often highly theoretical — remember, auxiliary assumptions might be statements like the entirety of classical mechanics is correct or the Hardy-Weinberg equilibrium is valid! It is important to note, that if we can’t verify AA, we will not be able to falsify T by using the valid argument above. Contrary to Popper, there really is no asymmetry between falsification and verification. If we cannot verify theoretical statements, then we cannot falsify them either.
Since verifying a theoretical statement is nearly impossible, and falsification often requires verification of assumptions, where does that leave scientific theories? What is required of a statement to make it scientific?
Carl Hempel came up with one of the more useful statements about the properties of scientific theories: 4 “The statements constituting a scientific explanation must be capable of empirical test.” And this statement about what exactly it means to be scientific brings us right back to things that scientists are very good at: experimentation and experimental design. If I propose a scientific explanation for a phenomenon, it should be possible to subject that theory to an empirical test or experiment. We should also have a reasonable expectation of universality of empirical tests. That is multiple independent (skeptical) scientists should be able to subject these theories to similar tests in different locations, on different equipment, and at different times and get similar answers. Reproducibility of scientific experiments is therefore going to be required for universality.
So to answer some of the questions we might have about reproducibility:
- Reproducible by whom? By independent (skeptical) scientists, working elsewhere, and on different equipment, not just by the original researcher.
- Reproducible to what degree? This would depend on how closely that independent scientist can reproduce the controllable variables, but we should have a reasonable expectation of similar results under similar conditions.
- Wouldn’t the expense of a particular apparatus make reproducibility very difficult? Good scientific experiments must be reproducible in both a conceptual and an operational sense. 5 If a scientist publishes the results of an experiment, there should be enough of the methodology published with the results that a similarly-equipped, independent, and skeptical scientist could reproduce the results of the experiment in their own lab.
Computational science and reproducibility
If theory and experiment are the two traditional legs of science, simulation is fast becoming the “third leg”. Modern science has come to rely on computer simulations, computational models, and computational analysis of very large data sets. These methods for doing science are all reproducible in principle. For very simple systems, and small data sets this is nearly the same as reproducible in practice. As systems become more complex and the data sets become large, calculations that are reproducible in principle are no longer reproducible in practice without public access to the code (or data). If a scientist makes a claim that a skeptic can only reproduce by spending three decades writing and debugging a complex computer program that exactly replicates the workings of a commercial code, the original claim is really only reproducible in principle. If we really want to allow skeptics to test our claims, we must allow them to see the workings of the computer code that was used. It is therefore imperative for skeptical scientific inquiry that software for simulating complex systems be available in source-code form and that real access to raw data be made available to skeptics.
Our position on open source and open data in science was arrived at when an increasing number of papers began crossing our desks for review that could not be subjected to reproducibility tests in any meaningful way. Paper A might have used a commercial package that comes with a license that forbids people at university X from viewing the code! 6 Paper 2 might use a code which requires parameter sets that are “trade secrets” and have never been published in the scientific literature. Our view is that it is not healthy for scientific papers to be supported by computations that cannot be reproduced except by a few employees at a commercial software developer. Should this kind of work even be considered Science? It may be research, and it may be important, but unless enough details of the experimental methodology are made available so that it can be subjected to true reproducibility tests by skeptics, it isn’t Science.
- This discussion closely follows a treatment of Popper’s asymmetry in: Sober, Elliot Philosophy of Biology (Boulder: Westview Press, 2000), pp. 50-51.
- Popper, Karl R. “The Logic of Scientific Discovery” 5th ed. (London: Hutchinson, 1959), pp. 40-41, 46.
- Gillies, Donald. “The Duhem Thesis and the Quine Thesis”, in Martin Curd and J.A. Cover ed. Philosophy of Science: The Central Issues, (New York: Norton, 1998), pp. 302-319.
- C. Hempel. Philosophy of Natural Science 49 (1966).
- Lett, James, Science, Reason and Anthropology, The Principles of Rational Inquiry (Oxford: Rowman & Littlefield, 1997), p. 47
- See, for example www.bannedbygaussian.org