The OpenScience Project | Open source scientific software

Words of Wisdom from a Senior Colleague

Posted on May 12, 2006 by Dan Gezelter

I’m done with my general chemistry course for the semester. It has been both a frustrating and rewarding semester. A student’s success in freshman chemistry has more to do with what attitude they bring to the table than it has to do with innate ability. Students that develop good study skills and self-discipline can easily survive a general chemistry class, and my section was set up to provide a framework for developing these skills. We had weekly tutorials, we assigned and actually graded the homework (a rarity in large general chemistry sections), we had many extra office hours, we had a group of excellent teaching assistants. We had all the bells and whistles we could think of to help the students make it through the class.

The amazing thing to me is how resistant some of the students were to the extra features of the “special” section. Everyone has had students like this: they sit in the back of the lecture hall, listen to their iPods during lectures, do the minimum number of problems, never speak up in tutorials, never come to office hours, complain about the difficulty of the tests, never pick up their tests to learn from their mistakes, and always always always blame the instructor and teaching staff for their grades at the end of the semester. During a discussion about these students, a senior colleague dropped this ruby:

Education is the only commodity that the consumer wants less of than the producer is willing to provide.

[tags]education, chemistry, pedagogy[/tags]

Posted in education | 3 Comments

Fast interpolation code?

Posted on April 10, 2006 by Dan Gezelter

Here’s a question for everyone who does numerical computation. Has anyone released an open-source 1-d interpolation algorithm with assembly code for the various kinds of processor SIMD extensions (i.e. SSE2, 3DNow, AltiVec)?

Most (if not all) scientific codes make repeated use of expensive numerical functions, and a quick glance through a few scientific codes (including my own) have convinced me that a fast interpolation scheme (particularly of sqrt() ) would speed up most scientific codes by factors of 2 or more. A typical Molecular Dynamics code will calculate square roots and functions like the Lennard-Jones potential,

or the Buckingham potential,

billions of times per timestep. Sqrt and exp (and the LJ and Buckingham potentials) are smooth, and we typically only need functions like this computed over a relatively short range of values. Cubic splines give you twice differentiable approximations, and precomputing the functions you need on a 1-d grid is trivial.

There are plenty of open-source cubic spline codes out there (i.e.PSPLINE, SPLINE, Carl De Boor’s Practical Guide to Splines fortran code, and many more). None of these codes (at least none of the ones I’ve seen) take advantage of the streaming multimedia extensions (SSE2, 3DNow, AltiVec) on modern processors. We hear great things about how fast these extensions are for Non-Uniform Rational B-Splines (NURBS), and for color interpolation in 2-d, but the demonstration assembly-language codes are pretty far beyond what most physicists, chemists, or biologists could easily splice into their programs.

What would really give scientific codes a boost, would be a fortran- or c-callable routine that let the user store an array of x and y values, and then branched to assembly language for the lookup of interpolated values.

Am I missing some code that anyone else knows about?

[tags]splines, simd, open source, scientific computing[/tags]

Posted in Science, Software | 7 Comments

OpenBugs

Posted on April 1, 2006 by Dan Gezelter

Open source software for Bayesian analysis using MCMC: a continuation of the WinBUGS project.
Find OpenBugs at: http://www.mathstat.helsinki.fi/openbugs/

Posted in Statistics | Leave a comment

SunlightLB

Posted on March 29, 2006 by Dan Gezelter

SunlightLB is an open-source 3D lattice Boltzmann code. It implements a standard lattice Boltzmann algorithm for three dimensional simulations, using a D3Q19 lattice with a twin relaxation time scheme. Objects, possibly moving, are included by a link bounce-back method. This enables SunlightLB to solve a variety of hydrodynamics problems such as the computation of flows through pore spaces, the computation of resistance matrices for colloidal hydrodynamics problems, and so on. Both zero Reynolds number flows, and non-zero Reynolds number flows, can be solved. In addition, passive scalar transport is implemented on top of the lattice Boltzmann scheme via a tagged-particle propagation algorithm, with a variety of boundary conditions. This allows simulation of a variety of reaction-advection-diffusion problems, such as a passive scalar adsorbing in a porous material in the presence of a flow (deep-bed filtration). SunlightLB is implemented as a library of C functions. Scripting language support is enabled by a SWIG interface file, which allows, for example, SunlightLB to be used as a perl extension module. Examples of the use of SunlightLB are provided in both C, perl, and python.
Find SunlightLB at: http://sunlightlb.sourceforge.net/

Posted in Fluid Dynamics | Leave a comment

Adium X

Posted on March 20, 2006 by Dan Gezelter

A brief break from the ongoing discussion to point out Adium X, which is an open source IM tool for Mac OS X.

The cool thing about it is that if you happen to have the OS X Equation Service installed, you can IM latex-style equations, and they are automatically converted to equations on the fly. There’s a post about it at Cosmic Variance.

It looks like a great tool for long-distance collaborative scientific correspondence.

[tags]IM, equations[/tags]

Posted in Fun, Science, Software | Leave a comment

Making Money from OpenSource Science Software. III. Sell Services

Posted on March 14, 2006 by Dan Gezelter

Stack of Money This is the third post in a series exploring how scientific software companies might be able to make money while still keeping their code open. The previous articles are:

There’s also a related thread going on at Notes from the Biomass.

The point of these articles is to find a way for producers of good scientific software to make money while still keeping their codes open for skeptical review. I think it is good scientific practice that codes be open to review, but I’m also sympathetic with the desire of the developers of this software to put food on the table.

One of the more common business models in the open source community that produces commodity software (e.g. operating systems) is to sell services. There are many examples of companies that provide source-level and binary downloads of one version of their product, while charging for the “enterprise” or supported version. The canonical example of this is RedHat, and judging from much of the chatter on the various scientific mailing lists (e.g. ccl.net), enterprise-level support for some of the major computational chemistry packages is sorely lacking. There’s clearly a demand for scientific software support, but would a support-based system be a viable way to support a company that opens up its code?

Again, the economics seems stacked against this model. First, scientific software targets a relatively small group of users, and at the same time, the development and support costs are often quite large. Problems with the software are often so complex they can’t be addressed by online FAQs or with banks of inexpensive support staff. So the support contracts would need to be expensive.

Secondly, academic users of scientific software have pools of relatively cheap but intelligent and highly-skilled labor. Why would a researcher spend $10000 on a support contract if the problem could be solved by throwing a graduate student at the open source version of the code for a few months?

The entire expense of the software development and support services would then fall on the few corporate clients of the producer of the software. Simply dividing the costs to the company by the number of expected buyers of the software means the initial price of a license or support contract would be prohibitively high, and if the software was high-quality software, I would expect that some of the corporate clients would attempt to run with the source just like the academics would.

For this reason, I’m reasonably sure that “selling support” isn’t a viable model for the scientific software community to adopt.

However, there is another way to sell services. Suppose a given software company acted as a service bureau that performed a specific kind of calculation for industry and academic labs. That is, if a company was producing an open source ligand docking program, they could supply the service of screening a particular set of potential drug molecules against a specific binding domain on a known protein target. The cost to provide data for one molecule docking with a binding domain could then be set reasonably low so that academic researchers could become familiar with the service. Clients that had larger demands (i.e. pharmaceutical companies) could subscribe to this service for an annual fee.

Essentially, this model has the company that wrote a piece of open source scientific software acting as a single-purpose supercomputing center. Since the developers of the primary application would be on staff, the company would already have the scientific and computational expertise to run the service. All that would be required would be clients willing to submit their “jobs” to the service bureau.

Would this model work in all scientific fields? Probably not. The company would need industrial clients with relatively deep pockets and a need for their services. But in chemistry, at least, the trend has been for the fine chemical and pharmaceutical companies to scale down their in-house computational groups. Would they be willing to farm-out their computational tasks to another company, particularly when the relevant information (like drug leads or even the targets of those drug molecules) are trade secrets? I don’t know yet, but I’m not aware of anyone who has tried this approach.

So my conclusions thus far are that “sell hardware” and “sell support” won’t work to fund open source science software, but there might find some scientific fields where computational service bureaus spring up around a piece of open source code.

Next up: Dual Licensing

Posted in Policy, Science, Software | Tagged money, open source, scientific software | 8 Comments

Making Money from OpenSource Science Software. II. Sell Hardware

Posted on March 13, 2006 by Dan Gezelter

Stack of Money This is the second post in a series in which I’m trying to figure out a general strategy for developers of open source scientific software to make a living without closing the source to their codes.

The canonical examples of this in the commodity software community are Apple and Sun Microsystems. Both companies make substantial fractions of their valuable code available under open source licenses, although in Apple’s case the “crown jewels” (the graphical portions of their operating system and the various iApps) are not part of the open source mix. Sun has a much more scientifically-interesting open source approach; they make the code available for Solaris as well as GridEngine, OpenOffice, and the Java Development Kit.

It should not be a surprise to anyone that the majority of hardware we run our code on these days are Apple laptops and desktops and Sun Opteron servers. Are we rewarding these vendors for making their code available? Not directly. A better explanation is that since both companies are invested in open source, a wide range of open source developers are making sure their codes work on these platforms. The tools we use daily (Linux, g++, xemacs, xmgr, gridengine, ddd, gdb) usually work out of the box on these platforms, so that’s what we end up buying.

So what’s the scientific version of this strategy? I’d like to think that companies that make the spectroscopic instrumentation would make their spectral analysis software available with source, but that doesn’t seem to be the case. Bruker’s “TopSpin” software isn’t freely available, even though they have included what looks like some of Jmol (which is under GPL), and Varian’s VnmrJ is also a closed-source product. A brief survey of some of the newer fields that rely on instrumentation (like the various surface-probe microscopies) show either propietary (and usually windows-only) software from the vendors themselves or community-supported open source replacements like GXSM.

I’d argue that the scientific hardware vendors are actually much less likely to release their codes than commodity hardware vendors. Scientific instruments are expensive, and you typically need up-to-date software to drive the instruments. The vendors know that once you’ve bought their hardware, selling you the software is a guaranteed revenue stream, and purchasing decisions about scientific instrumentation are almost never made based on the quality or transparency of the bundled software.

Also, the scientific codes I’m most interested in professionally are in areas for which there is no scientific instrumentation. Specifically codes that perform Quantum Chemistry, Molecular Dynamics, and Monte Carlo calculations don’t have an associated hardware vendor which would support their development.
So we can pretty much eliminate “Sell Hardware” as a viable strategy for making money from OpenSource scientific software. It may work in a few rare cases, but two things are opposing this avenue:

Many complicated and interesting scientific codes have no associated hardware to be sold.
Vendors of instrumentation will see little benefit to themselves from providing open source software because purchasing decisions don’t take software availablity into account.

Tomorrow’s topic: Sell Services

Posted in Policy, Science, Software | Tagged money, open source, scientific software | 3 Comments

How to make money from Open Source scientific software

Posted on March 12, 2006 by Dan Gezelter

Stack of Money One of my graduate students just got offered a job at computational chemistry software company. I have a lot of respect for this company; they make very good, very fast, and very capable products. They also hire real scientists and do a lot of work to make sure the calculations performed by their code actually mean something. Their advisory board contains some of the smartest people I have ever met.

However, their code is not open. And this, I think, is a real problem. Imagine a skeptical researcher who is sent a paper to review. Further imagine that this paper uses this company’s software, and the skeptical researcher doesn’t have the money for a license. He or she therefore can’t “look under the hood” to verify what’s going on if they have some questions about how the code is calculating something relevant to the paper. There are good reasons that this company doesn’t give away their code; they like to put food on the table, and they don’t trust the rest of the community to shell out the money for their programs if the code were available for free.

So, I’m left with a dilemma. I want this company to do well, to hire more of my students in the future, and to continue to produce high quality code. I also want the codes that we use in my field to be available for skeptical review. So today, I’m starting a set of posts in which I’ll try to hash out the following question: How can people make money from open source scientific software?

The question has been asked (and answered) many times before in non-scientific fields, and some of the answers that might work for a piece of commodity software (like a database) might not work so well for highly-specialized software. Over the next few days, I’ll lay out a set of common strategies from non-scientific fields to figure out if any of these strategies might work in the sciences.

A rough outline for the posts which will follow is:

Sell hardware
Sell services
Dual-license your software
Use the academic community
Differentiate between single run and high-throughput versions

I’m not naming the company involved. I’d like figure out a general strategy for making money from open source scientific software that won’t be specific for a single field. I’m hoping some of the developers and principles stop by to make comments, however.