Re-purposing Open Source projects for Science

C60 adsorbed on Au(111) I’ve been having discussions with my colleague, Alex Kandel, about a software tool he’s been working on. He has 20,000 or so STM images that his group has taken over the past five years, and he is building a web tool that will let his group search the images based on criteria like tip-surface voltage bias, surface preparation, scan rate, who took the images, etc.

So far, he’s been using Zoph, which is a photo gallery which has a nice search tool for the EXIF and IPTC data in JPEG and TIFF images. Zoph is a nice tool, but given the difficulty of writing EXIF headers into images with the common tools (ImageMagick, gd, netpbm), he’s had to hijack the image import functions and read the “extra” data from text files created by his microscopes.

EXIF and IPTC are great examples of file formats which store meta-data in the same file with the primary data. It makes a lot of sense to store the experimental parameters that generated an image in with the image itself.

My favorite gallery software, Gallery2 lets you view, but not search the EXIF data. Zoph‘s search functionality is a lot more extensive, so it was the natural choice for Alex’s tool.

There are many of areas of science (scanning probe microscopy, optical microscopy, various forms of astronomy) which use image data as their primary source of quantitative information, and there are some really wonderful open source image gallery tools that have been written to organize home and professional photography. Just a few more pieces are necessary to make these powerful scientific tools as well:
Octanethiol SAM

  • A unix command-line tool or library that gives us the ability to insert and edit EXIF/IPTC data in arbitrary images.
  • Data import modules for Zoph or Gallery2 that use the EXIF/IPTC data to populate the database with details from the metadata stored in the image file.
  • Extensible search modules for Zoph or Gallery2 that make it trivial to search arbitrary field names in this data.

These would turn good amateur photography tools into powerful scientific image managers.

Update: Alex found Exiv2 which can read and write EXIF and IPTC data directly. The second image above is a sample STM image which has a few “interesting” EXIF fields:

% exiv2 -pt 09090400BT.jpg
Exif.Image.DocumentName Ascii 15 09090400BT.SM3
Exif.Image.XResolution SLong 1 400
Exif.Image.YResolution SLong 1 400
Exif.Image.PlanarConfiguration Short 1 0
Exif.Image.ExifTag Long 1 89
Exif.Photo.ExposureTime SLong 1 23 s
Exif.Photo.SpectralSensitivity Ascii 16 lowpass filter
Exif.Photo.DateTimeOriginal Ascii 21 2004:09:09 14:27:26

Exif.Photo.BrightnessValue SRational 1 759
Exif.Photo.ExposureBiasValue SRational 1 +249
Exif.Photo.Flash Short 1 Yes
Exif.Photo.UserComment Undefined 80 Octanethiol SAM first imaged on 9-8-04, left in pink thiol box overnight

Although Exiv2 looks like it might be the key, Alex notes a few problems remaining with this approach:

  • He’s saving all values as ints or rationals because EXIF doesn’t seem to support floats.
  • He’s storing data in some infrequently used EXIF fields, and not all parsers will read it by default.

[tags]science, images, metadata[/tags]

Share
Posted in Science, Software | 2 Comments

Dance Like A Monkey!

Dance Like A Monkey Holy Cow. The legendary punk rock group, The New York Dolls, are back! And they recently went back into the studio to make a new album called One Day It Will Please Us To Remember Even This. On this album is a song which is an amazing shot over the bow of Intelligent Design.

The video features the Flying Spaghetti Monster, Pat Robertson’s marriage to an ape, a phylogenetic tree (with the NY Dolls at the top, of course), Dick Cheney with a shotgun, Charles Darwin dancing with a tortoise, a statue of liberty buried in sand, and tons more I probably missed.

Click on the picture above to go to the Quicktime version at RoadRunner Records. Or Click Here for the grainy YouTube version.

The New York Dolls are back! And they’re poking fun at creationists!

[tags]punk, creationism, satire[/tags]

Share
Posted in Fun | 2 Comments

catool

catool is a cross-platform GPL tool for the analysis of internal combustion engine pressure data. It calculates parameters such as IMEP, MFB, Pmax, Knock pressure and calculates cycle statistics. Data can be imported in CSV or AVL IFile format and exported as CSV, MATLAB or AVL IFile.
Find catool at: http://www.xarin.com/

Share
Posted in Engineering | 1 Comment

iBabel cheminformatics and molecule viewer

This is an Applescript Studio application that provides a front-end for a variety of Cheminformatics tools. To date these include file conversion (between a vast range of chemical file formats), SMARTS-based substructure searching, similarity searching, list manipulation, overlaying using OpenBabel, a 2D viewer using JChempaint, a 3D molecule viewer using Jmol, binaries for which are now included in the iBabel application.
Find iBabel cheminformatics and molecule viewer at: http://www.macinchem.org/ibabel/ibabel3.php

Share
Posted in Molecule Viewers and Editors | Leave a comment

So….

Speech So, imagine that there’s this odd verbal tic of most scientists in the US. They like to start sentences and paragraphs with the word “so” even if they aren’t drawing conclusions from discussions that went before. Back in January, The Celestial Monochord investigated this phenomenon, but it has been brought to light again by Uncertain Principles in a nascent discussion about common verbal tics of scientists.
The list so far:

  • So, …
  • … is left as an exercise to the reader.
  • In the limit of large / small …
  • For large / small values of …
  • orders of magnitude
  • canonical
  • cutoff
  • to a first-order approximation / to first order
  • trivial solution

I start sentences with “So…” all the time, and never realized I was doing it. So now I’m wondering, what other verbal tics do I have because of my choice of careers? Here are some potential candidates:

  • … at a steady state …
  • The expectation value of …
  • … highly non-optimal …
  • … at equilibrium …
  • That’s a forbidden transition.

Any others? Other than a disconcerting tendency to wander away from a conversation in mid-sentence, my speech is mostly normal. At least I think it is. My graduate students may disagree with this assessment.

[tags] language is a virus, science, phraseology [/tags]

Share
Posted in Fun | 4 Comments

g3data

g3data is used for extracting data from graphs. In publications graphs often are included, but the actual data is missing. g3data makes the extracting process much easier.
Find g3data at: http://www.frantz.fi/software/g3data.php

Share
Posted in Tools | Leave a comment

Free-software licenses

Everyone should go read Brooks Moses on Free-software licenses: requirements vs. requests. His post has made me re-think the license we use for our group simulation code. I’ve never like GPL because it essentially guarantees that friends in the corporate world won’t be able to use our code in their products; the simplicity of the BSD-style license has always appealed to me. As many people who adopt the BSD-style license have done, I threw in this attribution clause:

Acknowledgement of the program authors must be made in any publication of scientific results based in part on use of the program. An acceptable form of acknowledgement is citation of the article in which the program was described (Matthew A. Meineke, Charles F. Vardeman II, Teng Lin, Christopher J. Fennell and J. Daniel Gezelter, “OOPSE: An Object-Oriented Parallel Simulation Engine for Molecular Dynamics,” J. Comput. Chem. 26, pp. 252-271 (2005))

I know how often people forget to attribute code to the original author. Brooks points out that this places a big barrier in the way of adopting small bits of code (subroutines, individual fortran modules, etc.) into other packages. Pretty soon, users of big packages are citing hundreds of papers in fields that are very distant to the use of the code.

His suggestion is a “Requests” section of a license that would make the request for citation, and remove the forcefulness of the attribution clause. I like the idea. A lot.

Share
Posted in Science, Software | 2 Comments

Justify your funds

Over at Seed Magazine, the corporate overlords of ScienceBlogs, the cool kids on the block have been asked this provocative question:

Since they’re funded by taxpayer dollars (through the NIH, NSF, and so on), should scientists have to justify their research agendas to the public, rather than just grant-making bodies?

Uncertain Principles does a good job handling the question, and brings up the ghost of William Proxmire’s “Golden Fleece Awards” as an example of what happens when someone without a big-picture view of the scientific enterprise thinks he is a better judge of science than our peer reviewers.

I agree with just about everything he says, but I’d add that science (particularly the kind of science funded by the NSF) is a creative and serendipitous enterprise. It isn’t possible to predict discoveries or give guarantees that specific projects will work out as proposed. And most importantly, we may not know immediately how important a particular discovery is. Michael Faraday was once asked of what use was his discovery of electromagnetic induction. His response, “Of what use is a child?” is instructive. Allowing the public to vote on funding priorities or individual grants (Simon Cowell presents: “American Scientist!”) would be a recipe for disaster.

Share
Posted in Policy | 3 Comments

Where’s the Fun in Home Experiments?

Chemistry Set (photo from David Clugston) Wired magazine has an article called “Don’t Try This at Home” which starts by describing a recent CPSC raid on the house of the family that runs United Nuclear. We’ve mentioned UnitedNuclear before. They’re one of the few companies still around that is selling cool scientific supplies (i.e. chemicals) directly to consumers. Their list of chemicals contained fun oxidizers like potassium perchlorate and potassium nitrate.

Home-based experimentation is essential to raising the next generation of science nerds. And to make the best nerds in the world, the home experimentation needs to be a wee bit dangerous. My own interest in chemistry started when my AP chemistry teacher allowed us to raid the chemical stockroom of my high-school the summer before the school closed down. Our haul: silver nitrate, sodium peroxide, potassium permanganate, glycerine, magnesium strips, iron oxide and aluminum powders, and enough glassware to construct a working distillation apparatus. Those of our readers in the know will recognize these ingredients as dangerous; we hauled away enough thermite to burn down the neighborhood, and the glycerine / potassium permanganate reaction is truly a wonder to behold.

The reason that this raid on United Nuclear has nerdy bloggers like BoingBoing and myself up in arms is that the chemistry kits that kids have access to today are too darned “safe” to breed real science nerds. Chemistry kits these days have a bit of acid-base chemistry, a red-cabbage indicator, and maybe a little bit on phase changes. Possibly they’ll include the ingredients needed to make Oobleck. Where’s the highly exothermic redox reaction? Where are the half-cells to build your own battery? Where’s the fire and smoke? Where’s the fun?

I’m sure some of these changes are due to the increasingly litigious nature of our society, and some are due to fears of terrorism. The Wired article points out another reason that home chemistry kits have become so boring. They pin the blame partially on chemophobia, the fear of anything containing the prefix chem. Have you ever noticed that something with the Bio- prefix is considered perfectly safe and beneficial, while the same thing with the Chem- prefix is to be avoided at all cost? To quote my friends Kim and Dave: BioRinse sounds like an organic shampoo; ChemRinse sounds like something that will remove all your hair and skin.

Whatever the reasons, the disappearance of meaningful and fun home experimentation will mean that even fewer of our youngsters will be interested in pursuing careers in the physical sciences. And that’s a shame.

[tags]chemistry, experimentation, explosions[/tags]

Share
Posted in education, Fun, Policy, Science | 2 Comments

The CCP1GUI

The CCP1GUI project aims to develop a free, extensible Graphical User Interface to various computational chemistry codes developed by the worldwide academic community, with an emphasis on ab initio Quantum Chemistry codes.
Find The CCP1GUI at: http://www.cse.scitech.ac.uk/ccg/software/ccp1gui/

Share
Posted in Molecule Viewers and Editors | Leave a comment