Spark-MLlib的快速使用之七（决策树-分类）

最新推荐文章于 2021-02-01 17:22:49 发布

MLANDAI

最新推荐文章于 2021-02-01 17:22:49 发布

阅读量293

点赞数

分类专栏：机器学习-spark

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/tbb_1984/article/details/84139396

版权

机器学习-spark 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

（1）数据

1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0,3,13,16

2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0,8,32,40

3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0,5,27,32

含义

instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt

（2）代码

public class HWDecisionTreeClass {

//【3--15】为向量

//【16】为特征

private static class ParsePoint implements Function<String, LabeledPoint> {

private static final Pattern SPACE = Pattern.compile(",");

@Override

public LabeledPoint call(String line) {

String[] parts = line.split(",");

double[] v = new double[parts.length - 3];

for (int i = 0; i < parts.length - 3; i++)

v[i] = Double.parseDouble(parts[i + 2]);

return new LabeledPoint(Double.parseDouble(parts[16]), Vectors.dense(v));

}

}

public static void main(String[] args) {

SparkConf sparkConf = new SparkConf().setAppName("JavaDecisionTreeClassificationExample").setMaster("local");

JavaSparkContext jsc = new JavaSparkContext(sparkConf);

// 加载与解析数据

String datapath = "hour.txt";

JavaRDD<String> lines = jsc.textFile(datapath);

JavaRDD<LabeledPoint> traindata = lines.map(new ParsePoint());

List<LabeledPoint> take = traindata.take(3);

for (LabeledPoint labeledPoint : take) {

System.out.println("----->" + labeledPoint.features());

System.out.println("----->" + labeledPoint.label());

}

// 70%的数据用于训练，30%的数据用于测试

JavaRDD<LabeledPoint>[] splits = traindata.randomSplit(new double[] { 0.9, 0.1 });

// 训练数据

JavaRDD<LabeledPoint> trainingData = splits[0];

// 测试数据

JavaRDD<LabeledPoint> testData = splits[1];

// 设置参数，空的categoricalFeaturesInfo表示所有功能都是连续的。

Integer numClasses = 1900;

Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<Integer, Integer>();

String impurity = "gini";

Integer maxDepth = 20;

Integer maxBins = 32;

// 训练DecisionTree模型进行分类。

final DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,

impurity, maxDepth, maxBins);

// 使用模型进程预测，并和实际值比较

JavaPairRDD<Double, Double> predictionAndLabel =

testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {

@Override

public Tuple2<Double, Double> call(LabeledPoint p) {

return new Tuple2<Double, Double>(model.predict(p.features()), p.label());

}

});

System.out.println(predictionAndLabel.take(10));

Double testErr = 1.0 * predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {

@Override

public Boolean call(Tuple2<Double, Double> pl) {

return !pl._1().equals(pl._2());

}

}).count() / testData.count();

System.out.println("Test Error: -------------------------------------------------------------------" + testErr);

System.out.println("Learned classification tree model:\n-------------------------------------------"

+ model.toDebugString());

}

}

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark-MLlib的快速使用之七（决策树-分类）

（1）数据1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0,3,13,162,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0,8,32,403,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0,5,27,32含义instant,dteday,seaso...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。