weka开发学习

最新推荐文章于 2024-06-03 09:33:57 发布

xiaokui9

最新推荐文章于 2024-06-03 09:33:57 发布

阅读量687

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/xiaokui9/article/details/89185464

版权

读取文件和读取数据库数据

参考链接1和2

你可能要用的最常用的组件(components)是：

l Instances 你的数据

l Filter 对数据的预处理

l Classifiers/Clusterer 被建立在预处理的数据上，分类/聚类

l Evaluating 评价classifier/clusterer

l Attribute selection 去除数据中不相关的属性

测试的可用的代码如下：

public static Instances getInstances(String filePath) {
        try {
            filePath = "C:\\Weka-3-8\\data\\iris.arff";
                    
            //3.5.5和3.4.X版本
            Instances data = new Instances( new BufferedReader( new FileReader(filePath) ) );
            // setting class attribute
            // Class Index是指示用于分类的目标属性的下标。在ARFF文件中，它被默认为是最后一个属性，这也就是为什么它被设置成numAttributes-1.
            //你必需在使用一个Weka函数(ex: weka.classifiers.Classifier.buildClassifier(data))之前设置Class Index。
            data.setClassIndex(data.numAttributes() - 1);
            System.out.println( "#################data:" );
            System.out.println( data );

            //3.5.5和更新的版本
            //DataSource类不仅限于读取ARFF文件，它同样可以读取CSV文件和其它格式的文件(基本上Weka可以通过它的转换器(converters)导入所有的文件格式)。
            DataSource source = new DataSource(filePath);
            Instances data2 = source.getDataSet();
            // setting class attribute if the data format does not provide this information
            // E.g., the XRFF format saves the class attribute information as well
            //if (data2.classIndex() == -1)
            data2.setClassIndex(data2.numAttributes() - 1);
            System.out.println(  "#################data2:"  );
            System.out.println( data2 );

            //读取数据库
            InstanceQuery query = new InstanceQuery();
            //数据库配置在weka.jar文件中
            query.setUsername("root");
            query.setPassword("");
            query.setQuery("select * from url_features limit 0,10");//url_features
            // if your data is sparse, then you can say so too
            // query.setSparseData(true);
            Instances data3 = query.retrieveInstances();
            //把数据集全部输入出
            System.out.println( data3 );
            //用numInstances可以获得数据集中有多少样本
            for( int i = 0; i < data3.numInstances(); i++ )
            {
                //instance( i )是得到第i个样本
                System.out.println( data3.instance( i ) );
            }
            return data2;
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

数据库配置说明参见参考链接3。

分类器

参考链接4

要对数据集进行分类，第一步要指定数据集中哪一列做为类别，如果这一步忘记了（事实上经常会忘记）会出现“Class index is negative (not set)!”这个错误，设置某一列为类别用Instances类的成员方法setClassIndex，要设置最后一列为类别则可以用Instances类的numAttributes()成员方法得到属性的个数再减1。

            Instances m_instances = getInstances(filePath);//这里使用了上面代码的data2的方法
            J48 classifier = new J48();

            //NaiveBayes classifier2 = new NaiveBayes();
            //SMO classifier = new SMO();
            classifier.buildClassifier( m_instances );
            //输出的内容是数据中第0、60、110行的数据的分类结果
            System.out.println( classifier.classifyInstance( m_instances.instance( 0 ) ) );
            System.out.println( classifier.classifyInstance( m_instances.instance( 60 ) ) );
            System.out.println( classifier.classifyInstance( m_instances.instance( 110 ) ) );

分类评价

参考链接5

    //首先初始化一个Evaluation对象，Evaluation类没有无参的构造函数，一般用Instances对象作为构造函数的参数。

最低0.47元/天解锁文章

xiaokui9

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
weka开发学习

读取文件和读取数据库数据参考链接1和2你可能要用的最常用的组件(components)是：lInstances你的数据lFilter对数据的预处理lClassifiers/Clusterer被建立在预处理的数据上，分类/聚类lEvaluating评价classifier/clustererlAttribute selection去除数...
复制链接

扫一扫