Some new software is in our Knowledge Discovery and Data Mining section. I can remember a time when “data mining” was a bit of an epithet in science (like “fishing expedition”), but now it has become an established way of finding links and connectivities in large data sets. Three new open source data mining programs appeared on our radar recently:
- KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
- RapidMiner (formerly YALE) – not much detail is known about this package
- Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.