spark1.2.0源码MLlib --- 决策树-01

最新推荐文章于 2019-12-13 20:44:02 发布

Yobadman

最新推荐文章于 2019-12-13 20:44:02 发布

阅读量1.8k

点赞数

分类专栏： spark源码文章标签： spark 大数据源码

本文链接：https://blog.csdn.net/Yobadman/article/details/43157273

版权

决策树可以分两类：分类树和回归树，分别用于分类模型和线性回归模型。

首先，看一下spark中的使用案例，代码如下：

import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.util.MLUtils

// Load and parse the data file.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
// Split the data into training and test sets (30% held out for testing)
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

// Train a DecisionTree model.
//  Empty categoricalFeaturesInfo indicates all features are continuous.
val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int]()   //当map为空时，表示特征属性为连续的情况
val impurity = "gini"   //以gini指标作为节点不纯度的度量
val maxDepth = 5   //树的最大深度
val maxBins = 32   //每个特征分裂时，最大的属性数目（一般是特征属性连续的情况）

val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,  //分类模型
  impurity, maxDepth, maxBins)

// Evaluate model on test instances and compute test error
val labelAndPreds = testData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val testErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / testData.count()
println("Test Error = " + testErr)
println("Learned classification tree model:\n" + model.toDebugString)

建立回归模型，代码如下：

import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.util.MLUtils

// Load and parse the data file.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
// Split the data into training and test sets (30% held out for testing)
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

// Train a DecisionTree model.
//  Empty categoricalFeaturesInfo indicates all features are continuous.
val categoricalFeaturesInfo = Map[Int, Int]()

最低0.47元/天解锁文章

Yobadman

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
spark1.2.0源码MLlib --- 决策树-01

决策树可以分两类：分类树和回归树，分别用于分类模型和线性回归模型。首先，看一下spark中的使用案例，代码如下：import org.apache.spark.mllib.tree.DecisionTreeimport org.apache.spark.mllib.util.MLUtils// Load and parse the data file.val data = MLUt
复制链接

扫一扫