The OpenScience Project | Open source scientific software

LiveGraph: Real-Time Plotter and Exploratory Data Analysis Tool

Posted on March 4, 2010 by Dan Gezelter

Plots data live – while it’s being produced by any application. * Very simple point-and-click interface. * Fully automatic intelligent graph layout. * Single click graph transformations: Linear, Logarithm, Unit Interval, etc. * Time axes support. * APIs for integration into 3rd party software and for data logging. * Open standard data file format.
Find LiveGraph: Real-Time Plotter and Exploratory Data Analysis Tool at: http://www.live-graph.org

Posted in 2D Plotting | Leave a comment

PyBLAW

Posted on March 1, 2010 by Dan Gezelter

PyBLAW is a lightweight Python framework for solving one-dimensional systems of hyperbolic balance laws of the form q_t + f(q)_x = s(q).
Find PyBLAW at: http://memmett.github.com/PyBLAW/

Posted in Numerical Methods | Leave a comment

PyWENO (Python WENO)

Posted on March 1, 2010 by Dan Gezelter

PyWENO is a Python implementation of one-dimensional Weighted Essentially Non-oscillatory (WENO) approximations over unstructured (non-uniform) grids.
Find PyWENO (Python WENO) at: http://memmett.github.com/PyWENO/

Posted in Numerical Methods | Leave a comment

Packmol

Posted on February 19, 2010 by Dan Gezelter

One of the biggest issues you face when you first start doing molecular dynamics (MD) simulations is how to create an initial geometry that won’t blow up in the first few time steps. Repulsive forces are very steep if the atoms are too close to each other, and if you are trying to simulate a condensed phase (liquid, solid, or interfacial) system, it can be hard to know how to make a sensible initial structure.

Packmol is a cool program that appears to solve this problem. It creates an initial point for molecular dynamics simulations by packing molecules in defined regions of space. The packing guarantees that short range repulsive interactions do not disrupt the simulations. The great variety of types of spatial constraints that can be attributed to the molecules, or atoms within the molecules, makes it easy to create ordered systems, such as lamellar, spherical or tubular lipid layers. It works with PDB and XYZ files and appears to be available under the GPL. Very, very cool!

Posted in open science, Science, Software | 13 Comments

Gwyddion – Open Source SPM analysis

Posted on February 11, 2010 by Dan Gezelter

We just discovered a very cool open source program for analyzing scanning probe microscopy (SPM) data files. There a number of incompatible and proprietary file formats for surface microscopies (AFM, MFM, STM, SNOM/NSOM) and getting data out from a microscope for further processing (including baseline leveling, profile analysis, and statistical analysis) can be a difficult task. Gwyddion is a Gtk+ based package that runs on Linux, Mac OS X (with MacPorts) and Windows and appears to do nearly everything that some expensive commercial packages (and some free closed-source packages) can do. Some of our colleagues were very happy to discover this piece of wizardry!

Posted in open science, Science, Software | Leave a comment

Open Science on “Future Tense”

Posted on February 5, 2010 by Dan Gezelter

Yesterday’s “Future Tense” radio program on Australian Broadcasting was just posted online. The topic was Open Science, and I managed to get interviewed for the show. The interview with Anthony Funnell was a great conversation, and he’s pulled out some of the better bits while making the Open Science movement sound only slightly utopian.

Posted in Uncategorized | Leave a comment

If you’re going to do good science, release the computer code too

Posted on February 5, 2010 by Dan Gezelter

A very nice aarticle by Darrel Ince has just been posted over at the Guardian. It deals with the climate-gate email theft and the quality of academic science code has just been . An excerpt:

Computer code is also at the heart of a scientific issue. One of the key features of science is deniability: if you erect a theory and someone produces evidence that it is wrong, then it falls. This is how science works: by openness, by publishing minute details of an experiment, some mathematical equations or a simulation; by doing this you embrace deniability. This does not seem to have happened in climate research. Many researchers have refused to release their computer programs — even though they are still in existence and not subject to commercial agreements. An example is Professor Mann’s initial refusal to give up the code that was used to construct the 1999 “hockey stick” model that demonstrated that human-made global warming is a unique artefact of the last few decades. (He did finally release it in 2005.)

Posted in open science, Science, Software | 1 Comment

Kitware has a blog!

Posted on January 29, 2010 by Dan Gezelter

Geoff Hutchinson just pointed us to the new blog over at Kitware (the makers of VTK). I’ve found VTK enormously helpful in the past (particularly the source to vtkMath.cxx) and I’m glad they’ve made the commitment to Open Source.

My favorite post so far: Why Open Source Will Rule Scientific Computing by Will Schroeder.

Posted in open science, Software | Leave a comment

Being Scientific: Falsifiability, Verifiability, Empirical Tests, and Reproducibility

Posted on December 1, 2009 by Dan Gezelter

If you ask a scientist what makes a good experiment, you’ll get very specific answers about reproducibility and controls and methods of teasing out causal relationships between variables and observables. If human observations are involved, you may get detailed descriptions of blind and double-blind experimental designs. In contrast, if you ask the very same scientists what makes a theory or explanation scientific, you’ll often get a vague statement about falsifiability. Scientists are usually very good at designing experiments to test theories. We invent theoretical entities and explanations all the time, but very rarely are they stated in ways that are falsifiable. It is also quite rare for anything in science to be stated in the form of a deductive argument. Experiments often aren’t done to falsify theories, but to provide the weight of repeated and varied observations in support of those same theories. Sometimes we’ll even use the words verify or confirm when talking about the results of an experiment. What’s going on? Is falsifiability the standard? Or something else?

The difference between falsifiability and verifiability in science deserves a bit of elaboration. It is not always obvious (even to scientists) what principles they are using to evaluate scientific theories, ¹ so we’ll start a discussion of this difference by thinking about Popper’s asymmetry. ² Consider a scientific theory (T) that predicts an observation (O). There are two ways we could approach adding the weight of experiment to a particular theory. We could attempt to falsify or verify the observation. Only one of these approaches (falsification) is deductively valid:

Falsification	Verification
If T, then O Not-O	If T, then O O
Not-T	T
Deductively Valid	Deductively Invalid

Popper concluded that it is impossible to know that a theory is true based on observations (O); science can tell us only that the theory is false (or that it has yet to be refuted). He concluded that meaningful scientific statements are falsifiable.

Scientific theories may not be this simple. We often base our theories on a set of auxiliary assumptions which we take as postulates for our theories. For example, a theory for liquid dynamics might depend on the whole of classical mechanics being taken as a postulate, or a theory of viral genetics might depend on the Hardy-Weinberg equilibrium. In these cases, classical mechanics (or the Hardy-Wienberg equilibrium) are the auxiliary assumptions for our specific theories.

These auxiliary assumptions can help show that science is often not a deductively valid exercise. The Quine-Duhem thesis ³ recovers the symmetry between falsification and verification when we take into account the role of the auxiliary assumptions (AA) of the theory (T):

Falsification	Verification
If (T and AA), then O Not-O	If (T and AA), then O O
Not-T	T
Deductively Invalid	Deductively Invalid

That is, if the predicted observation (O) turns out to be false, we can deduce only that something is wrong with the conjunction, (T and AA); we cannot determine from the premises that it is T rather than AA that is false. In order to recover the asymmetry, we would need our assumptions (AA) to be independently verifiable:

Falsification	Verification
If (T and AA), then O AA Not-O	If (T and AA), then O AA O
Not-T	T
Deductively Valid	Deductively Invalid

Falsifying a theory requires that auxiliary assumption (AA) be demonstrably true. Auxiliary assumptions are often highly theoretical — remember, auxiliary assumptions might be statements like the entirety of classical mechanics is correct or the Hardy-Weinberg equilibrium is valid! It is important to note, that if we can’t verify AA, we will not be able to falsify T by using the valid argument above. Contrary to Popper, there really is no asymmetry between falsification and verification. If we cannot verify theoretical statements, then we cannot falsify them either.

Since verifying a theoretical statement is nearly impossible, and falsification often requires verification of assumptions, where does that leave scientific theories? What is required of a statement to make it scientific?

Carl Hempel came up with one of the more useful statements about the properties of scientific theories: ⁴ “The statements constituting a scientific explanation must be capable of empirical test.” And this statement about what exactly it means to be scientific brings us right back to things that scientists are very good at: experimentation and experimental design. If I propose a scientific explanation for a phenomenon, it should be possible to subject that theory to an empirical test or experiment. We should also have a reasonable expectation of universality of empirical tests. That is multiple independent (skeptical) scientists should be able to subject these theories to similar tests in different locations, on different equipment, and at different times and get similar answers. Reproducibility of scientific experiments is therefore going to be required for universality.

So to answer some of the questions we might have about reproducibility:

Reproducible by whom? By independent (skeptical) scientists, working elsewhere, and on different equipment, not just by the original researcher.
Reproducible to what degree? This would depend on how closely that independent scientist can reproduce the controllable variables, but we should have a reasonable expectation of similar results under similar conditions.
Wouldn’t the expense of a particular apparatus make reproducibility very difficult? Good scientific experiments must be reproducible in both a conceptual and an operational sense. ⁵ If a scientist publishes the results of an experiment, there should be enough of the methodology published with the results that a similarly-equipped, independent, and skeptical scientist could reproduce the results of the experiment in their own lab.

Computational science and reproducibility

If theory and experiment are the two traditional legs of science, simulation is fast becoming the “third leg”. Modern science has come to rely on computer simulations, computational models, and computational analysis of very large data sets. These methods for doing science are all reproducible in principle. For very simple systems, and small data sets this is nearly the same as reproducible in practice. As systems become more complex and the data sets become large, calculations that are reproducible in principle are no longer reproducible in practice without public access to the code (or data). If a scientist makes a claim that a skeptic can only reproduce by spending three decades writing and debugging a complex computer program that exactly replicates the workings of a commercial code, the original claim is really only reproducible in principle. If we really want to allow skeptics to test our claims, we must allow them to see the workings of the computer code that was used. It is therefore imperative for skeptical scientific inquiry that software for simulating complex systems be available in source-code form and that real access to raw data be made available to skeptics.

Our position on open source and open data in science was arrived at when an increasing number of papers began crossing our desks for review that could not be subjected to reproducibility tests in any meaningful way. Paper A might have used a commercial package that comes with a license that forbids people at university X from viewing the code! ⁶

Paper 2 might use a code which requires parameter sets that are “trade secrets” and have never been published in the scientific literature. Our view is that it is not healthy for scientific papers to be supported by computations that cannot be reproduced except by a few employees at a commercial software developer. Should this kind of work even be considered Science? It may be research, and it may be important, but unless enough details of the experimental methodology are made available so that it can be subjected to true reproducibility tests by skeptics, it isn’t Science.

Posted in Open Data, open science, Science | 5 Comments

jHepWork

Posted on November 29, 2009 by Dan Gezelter

Data analysis framework
Find jHepWork at: http://jwork.org/jhepwork/

Posted in Computational | Leave a comment

Computational science and reproducibility

Categories