New Software: Data Mining

Scientific Software Some new software is in our Knowledge Discovery and Data Mining section. I can remember a time when “data mining” was a bit of an epithet in science (like “fishing expedition”), but now it has become an established way of finding links and connectivities in large data sets. Three new open source data mining programs appeared on our radar recently:

  • KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
  • RapidMiner (formerly YALE) – not much detail is known about this package
  • Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Share
Posted in Science, Software | Tagged | 6 Comments

Researching Open Science

I don’t know how I missed this before, but there’s a really interesting article from 2006 up at the Harvard Business School “Working Knowledge” site. It details some of Karim Lakhani‘s results from a paper called ‘The Value of Openness in Scientific Problem Solving‘. The paper itself is actual detailed research on different methods of scientific problem solving that is really worth a read for anyone in the Open Science movement. They went looking to see if “Broadcast Search” (i.e. telling the world what problem you are working on) is an effective means of problem solving. My favorite part of the paper:

Our most counter-intuitive finding was the positive and significant impact of the self-assessed distance between the problem and the solver’s field of expertise on the probability of creating a winning solution. This finding implies that the farther the solvers assessed the problem as being from their own field of expertise, the more likely they were to create a winning submission. We reason that the significance of this effect may be due to the ability of “outsiders” from relatively distant fields to see problems with fresh eyes and apply solutions that are novel to the problem domain but well known and understood by them.

Share
Posted in open science, Science | Tagged , | 1 Comment

Cool Radiohead interactive video

So, I like Radiohead. A lot. Kid A has been in permanent rotation in my music collection for a couple of years now. But their new video for House of Cards is something else entirely. It was generated from 3-D data of Thom Yorke’s face collected via a Geometric Informatics scanning system which uses structured light to capture 3D images at close proximity. There’s an official video, but the best part is the completely interactive data viewer. Try it yourself!

Share
Posted in Fun | 1 Comment

Automated out-of-plane finder?

The code I’ve been working on has some cool features. If you give it a list of atoms and bonds, it automatically figures out bend and dihedral interactions using simple graph concepts. That is, if the molecule has a bond between atoms i and j and another bond between atoms j and k, you can easily deduce that there’s a bend interaction between i, j, and k. Similar three-bond ideas can be used to automatically determine dihedral interactions: Find bonds i-j, j-k, and k-l, then you can deduce the torsion for i-j-k-l.

For out-of-plane bends or improper torsions at the sp2 sites, there’s no simple graph theory way to determine an out-of-plane interaction. You actually need to know something about the chemical identity of the central atom. At least, I think this is the case. I’d love to be proven wrong, because keeping track of valences and bond counts is beyond the level of coding I wanted to include.

Share
Posted in Science, Software | Tagged | Leave a comment

FooCamp? BarCamp?

One of the more interesting aspects of the New Communication Channels workshop was something called the “SciBarCamp” that was organized by Jen Dodd. I’d never been at a meeting which used this format before, and I was a bit dubious when I first heard about it, but it worked well with the group that was at this meeting. Here’s how it functions:

  • After a morning of more traditional talks, everyone files in to a large room. Each participant gets a sheet of paper on which they write their name, and the name of a workshop that they are interested in leading.
  • Each of these sheets of paper gets tacked up to a board in the middle of the room, and people mill around looking at all of the proposed workshop titles. If you see a workshop that looks interesting, you vote for that workshop by bubbling in a circle on the sheet of paper.
  • The conference organizer can combine workshops if they look similar (in our case, a bunch of Wiki-related workshops were combined).
  • After about half an hour, the most popular workshops are selected and scheduled in particular rooms and time slots.
  • If your workshop was popular enough, you then have to lead it!
  • People can vote with their feet too; if a workshop is boring, you are encouraged to walk out and find one that isn’t (although in practice, few people actually did this).

Controversy was pretty much at a minimum because we were all converts to doing open science in one form or another (open source, open data, open access, open notebook). But we certainly got groups of people in each workshop who were guaranteed to be interested in the topic under discussion. After all, they’d voted for that workshop topic!

In order to make this work, you need a really good organizer to explain things up front. Scientists can be socially awkward and unwilling to try new formats, but this worked out well. I hope we start to see more of this kind of thing at smaller meetings.

Share
Posted in Conferences, Science | Tagged , , | Leave a comment

Cool finds at the NCCB2008 workshop

Some of the cooler online resources that have been discussed at the NCCB2008 workshop:

Share
Posted in Policy, Science, Software | Tagged , | 1 Comment

New Communication Channels for Biology Workshop

I’m going to be giving a talk at the “New Communication Channels for Biology” Workshop run by the CalIT2 folks at UCSD. The workshop is Thursday and Friday, and there are going to be some interesting folks like Michael Nielsen, Hilary Spencer, Jean-Claude Bradley, Aaron Fulkerson, Michael Gribsikov, and a bunch more. It should be pretty interesting!

Share
Posted in Policy, Science | 3 Comments

Praat: doing Phonetics by Computer

Praat is an integrated software workbench for studying phonetics on real-life sound data.
Find Praat: doing Phonetics by Computer at: http://www.fon.hum.uva.nl/praat/

Share
Posted in Linguistics | Leave a comment

OpenSim

OpenSim is an open-source software system that lets users develop models of musculoskeletal structures and create dynamic simulations of movement. The software provides a platform on which the biomechanics community can build a library of simulations that can be exchanged, tested, analyzed, and improved through multi-institutional collaboration.
Find OpenSim at: https://simtk.org/home/opensim

Share
Posted in Engineering | Leave a comment

SimTK core

The SimTK Core project collects together all the hardware-independent binaries needed for the various SimTK Core subprojects. These include SimTKcommon, Simmath, Simmatrix, Simbody, CPodes, IPopt, and much more. See the individual projects for descriptions.
Find SimTK core at: https://simtk.org/home/simtkcore

Share
Posted in Engineering | Leave a comment