Greg Wilson has written a great article in American Scientist on the shocking absence of modern software development practices from groups that do science using computers. I know exactly what Greg is talking about. Some of the groups I’ve worked with have had 10 or 20 different versions of a code, each developed by a different graduate student or post-doc. There may be some pedagogical reasons for doing this; my own students usually start with a first-year project that involves re-implementing (and then modifying in some interesting way) the basic liquid simulation code for a small box of Argon. There’s a lot to be said for understanding the gory details of what a piece of scientific code is doing before moving on to a larger and much more complex piece of software.
After my students do that first-year project, they usually start working with the larger group code, and that code is managed with all of the modern tools we can get our hands on. We use CVS for source control, doxygen for automatic generation of class documentation, autoconf for building on multiple architectures, and a mix of IDEs and symbolic debuggers for working out the kinks. We use every buzzword-compliant feature of modern programming languages (including pointers and self-adjusting vectors in Fortran). These tools are used by an astonishingly small fraction of working scientists, however. Greg points to the number of working scientists still editing their code with vi or wordpad, and I’m continually astonished at how many scientists see absolutely nothing wrong with implicit variable names.
Unfortunately, the situations in which we’ve needed excellent coding tools (i.e. debugging our parallel simulations, data passing between different languages) are the places where the open source codes have been missing. There are wonderful symbolic debuggers in the commercial world, but very few open source projects to do what needs to be done.
Greg hasn’t just written an article lamenting the inability of modern scientists to use the tools that are out there. He’s gone and done something about it. He’s created an online course to teach us too-smart-for-our-own-good lunkheads how to use the things that make software development easier.
Here’s an excerpt from the article:
I finally asked a friend who was pursuing a doctorate in particle physics why he insisted on doing everything the hard way. Why not use an integrated development environment with a symbolic debugger? Why not write unit tests? Why not use a version-control system? His answer was, “What’s a version-control system?”
A version-control system, I explained, is a piece of software that monitors changes to files—programs, Web pages, grant proposals and pretty much anything else. It works like the “undo” button on your favorite editor: At any point, you can go back to an older version of the file or see the differences between the way the file was then and the way it is now. You can also determine who else has edited the file or find conflicts between their changes and the ones you’ve just made. Version control is as fundamental to programming as accurate notes about lab procedures are to experimental science. It’s what lets you say, “This is how I produced these results,” rather than, “Um, I think we were using the new algorithm for that graph—I mean, the old new algorithm, not the new new algorithm.”
My friend was intelligent and intimately familiar with the problems of writing large programs—he had inherited more than 100,000 lines of computer code and had already added 20,000 more. Discovering that he didn’t even know what version control meant was like finding a chemist who didn’t realize she needed to clean her test tubes between experiments. It wasn’t a happy conversation for him either. Halfway through my explanation, he sighed and said, “Couldn’t you have told me this three years ago?”
Go read the whole article.
Update: Greg has a blog!
[tags]science, software, programming, tools[/tags]