Some new software is in our Knowledge Discovery and Data Mining section. I can remember a time when “data mining” was a bit of an epithet in science (like “fishing expedition”), but now it has become an established way of finding links and connectivities in large data sets. Three new open source data mining programs appeared on our radar recently:
- KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
- RapidMiner (formerly YALE) – not much detail is known about this package
- Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
The link to RapidMiner seems to not work well. You may want to try to link to
Funny formulation “not much is known about”… :^)
Is it this an alias for “I don’t know much about”?
You may want to take a look at one of the above pages or at the Wikipedia entry about this software or take a look at the RapidMiner/YALE video on Rapid-I’s web page.
I fixed the RapidMiner link, but the person who submitted it didn’t give much detail, so I couldn’t really tell much about it. The code download seems pretty useful, and includes a lot of other open source (Java) packages like:
Correction…KNIME isn’t open source. If you look at the second paragraph of its license, it states this explicitly.
Actually, from the perspective of open science, I consider the KNIME license to count as open source. The acid test is that it allows skeptical inquiry into the workings of the code even for commercial and academic competitors.
You may freely argue the wisdom of their license in terms of allowing code to become more widely used. But open science only requires all skeptical reviewers and observers be given unfettered access to the implementation of the algorithm, and I think that the KNIME license does this.
The semantics here are important; open source is a term invented in 1998 or so and is defined at http://www.opensource.org/docs/osd. I don’t think that it helps to muddy this definition (as KNIME are doing).
The source code is available for free is about the most you can say. And this means that you can examine the workings of the code, as you say.
Is that all we should expect from “open science”? That’s up to you of course, but here are some limitations:
(1) As a scientist, I might like to implement some algorithm in my own code. However, I cannot copy and paste from KNIME source code. To me, this makes such code a scientific dead end. (It’s like saying, you cannot reference this paper or continue this line of investigation)
(2) You believe that competitors can examine such code. IANAL, but I believe that if I admit to looking at proprietary source code and implement something similar in my own code, there could be legal issues.
Pingback: User links about "machinelearning" on iLinkShare