一个开源的数据挖掘软件 Orange

最新推荐文章于 2024-05-23 18:31:56 发布

weixin_30897233

最新推荐文章于 2024-05-23 18:31:56 发布

阅读量303

点赞数

文章标签： python 数据结构与算法人工智能

原文链接：http://www.cnblogs.com/weilaiyxj/archive/2006/01/06/312413.html

版权

          真奇怪为什么开源了这么多的数据挖掘软件都是外国的，而国内的没有一个？
         曾有过一个想法做一个自己的数据挖掘软件。但是至今还未实现。自己一个人的力量太有限了。没有办法。

        还是先学习一下人家的东西吧。

        http://www.ailab.si/orange

        Orange 是一个基于组件的数据挖掘软件。它包括一系列技术包括数据预处理，建模和数据挖掘等。它是基于被称作Orange Widgets的C++组件，这些组件或者通过Python scripts访问,或者直接访问，或者使用GUI访问。

Some Features of Orange

Orange is a component-based framework, which means you can use existing components and build your own ones. You can even prototype your own components in Python, and use it in place of some standard C-based Orange component. For instance, you may craft your own function for attribute quality estimation, and use it within Orange's classification tree induction algorithm. Orange provides for some elementary components and more complex components build from elementary ones, and uses Python as a glue language. Some of the readily-available features of Orange include:

Data input/ouput: Orange can read from and write to tab-delimited files and C4.5 files, and supports also some more exotic formats.
Preprocessing: feature subset selection, categorization, feature utility estimation for predictive tasks.
Predictive modelling: classification trees, naive bayes, k-NN, majority classifier, support vector machines, logistic regression. Ensemble methods like boosting and bagging are also included .
Model validation: different data sampling and validation techniques (like cross-validation, random sampling, etc.), and various statistics for model validation (classification accuracy, AUC, sensitivity, specificity, ...) are included. Orange evaluation schemas support caching: validation results (class probabilities) are stored, and rerunning the validation will only validate new classifiers.

Visual Programming

Orange's visual programming interface is based on GUI components we call Orange Widgets, and a signalling framework that uses communication channels to connect widgets and tokens to pass the data from one widget to another. Although this sounds very scientific, working with widgets in the Orange Canvas is simple as point-and-click. Currently, more then forty widgets are available, with more coming out every week.

Although many data mining suites now incorporate visual programming, Orange widgets are sort of special. Namely, the design principle in Orange is interaction: in many widgets, the objects interactively selected in one widget (e.g., data, attributes, ...) can be passed to the other one for further processing. Here's a snapshot that demonstrates this concept. It shows Orange Canvas, with widgets that read the famous Iris data set, construct a decision tree, and visualize it. But there's more: the data is shown in the scatterplot, which not only shows a complete data set (channel from File widget) but also marks the data instances which belong to a selected classification tree node (channel from Classification Tree Viewer, filled dots at scatterplot).

Orange Scripting

Seamless integration within Python is probably the most important feature of Orange. Python is a great and very flexible scripting language. We have designed Orange to be fully accessible within Python and are trying to expose almost every essential Orange's component within Python. We provide a number of example scripts on Orange's web pages, but just to give you a taste, here is one that reads the data file, builds a naive Bayesian classifier and outputs original and predicted class for first five instances:

import orange data = orange.ExampleTable('voting') classifier = orange.BayesLearner(data) for i in range(5): print data[i].getclass(), 'classified as', classifier(data[i])

Here is another example that imports two Orange's modules (orngTest and orngStat), reads the data, and uses cross-validation to compare two classifiers using classification accuracy and Brier score:

import orange, orngTest, orngStat data = orange.ExampleTable('voting') bayes = orange.BayesLearner() tree = orange.TreeLearner() results = orngTest.crossValidation([bayes,tree], data, folds=5) print 'Classification accuracy: ', orngStat.CA(results) print 'Brier Score: ', orngStat.BrierScore(results)

Citing Orange

Orange is released under General Programming License (GPL) and as such is free if you use it under these terms. We do, however, oblige the users to cite the following white paper together with any other work that accompanied Orange any time you use Orange in your publications:

Demsar J, Zupan B, Leban G (2004) Orange: From Experimental Machine Learning to Interactive Data Mining, White Paper (www.ailab.si/orange), Faculty of Computer and Information Science, University of Ljubljana.

And if you will become an Orange user, we won’t mind getting a postcard from you. Please use the following address:

Orange
AI Lab
Faculty of Computer and Informations Science
University of Ljubljana
Trzaska 25
SI-1000 Ljubljana
Slovenia

        简单用了一下，发现它的操作和SPSS的Clementine有点像，都是基于这种数据流的形式。但是它是开源的，而且是基于组件的，我们可以在此基础上对它进行二次开发，我想等这段时间忙完了，可以好好研究一下。做它现在还没有的几个模块，FP-Tree，NN，SVM等。

转载于:https://www.cnblogs.com/weilaiyxj/archive/2006/01/06/312413.html

weixin_30897233

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
一个开源的数据挖掘软件 Orange

真奇怪为什么开源了这么多的数据挖掘软件都是外国的，而国内的没有一个？曾有过一个想法做一个自己的数据挖掘软件。但是至今还未实现。自己一个人的力量太有限了。没有办法。还是先学习一下人家的东西吧。 http://www.ailab.si/orangeOrange 是一个基于组件的数据挖掘软件。它包括一系列技术包括数据...
复制链接

扫一扫