Spark中组件Mllib的学习35之随机森林（entropy）进行分类

最新推荐文章于 2023-02-13 22:44:26 发布

KeepLearningBigData

最新推荐文章于 2023-02-13 22:44:26 发布

阅读量1.9k

点赞数 1

分类专栏： MLlib 文章标签： spark 机器学习 mllib 随机森林

本文链接：https://blog.csdn.net/xubo245/article/details/51498698

版权

MLlib 专栏收录该内容

41 篇文章 0 订阅

订阅专栏

更多代码请见：https://github.com/xubo245/SparkLearning
Spark中组件Mllib的学习之分类篇
1解释
随机森林：RandomForest
大概思想就是生成多个决策树，都单独训练；如果来了一个数据，用各个决策树进行回归预测，如果是非连续结果，则取最多个数的值；如果连续，则取多个决策树结果的平均值。

2.代码：

/**
  * @author xubo
  *         ref:Spark MlLib机器学习实战
  *         more code:https://github.com/xubo245/SparkLearning
  *         more blog:http://blog.csdn.net/xubo245
  */
package org.apache.spark.mllib.learning.classification

import org.apache.spark.mllib.tree.{RandomForest, DecisionTree}
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by xubo on 2016/5/23.
  *
  */
object DecisionTrees3GBT {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
    val sc = new SparkContext(conf)

    // Load and parse the data file.
    val data = MLUtils.loadLibSVMFile(sc, "file/data/mllib/input/classification/dt.txt")
    val numClasses = 2 //设定分类的数量
    val categoricalFeaturesInfo = Map[Int, Int]() //设置输入数据格式
    val numTrees = 3 //设置随机雨林中决策树的数目
    val featureSubsetStrategy = "auto" //设置属性在节点计算数
    val impurity = "entropy" //设定信息增益计算方式
    val maxDepth = 5 //设定树高度
    val maxBins = 3 //设定分裂数据集

    val model = RandomForest.trainClassifier(data, numClasses, categoricalFeaturesInfo,
      numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins) //建立模型

    model.trees.foreach(println) //打印每棵树的相信信息

    val labelAndPreds = data.take(2).map { point =>
      val prediction = model.predict(point.features)
      (point.label, prediction)
    }
    labelAndPreds.foreach(println)
     println(model.toDebugString)
    sc.stop
  }
}

3.结果：

DecisionTreeModel classifier of depth 2 with 5 nodes
DecisionTreeModel classifier of depth 1 with 3 nodes
DecisionTreeModel classifier of depth 0 with 1 nodes
(1.0,1.0)
(0.0,0.0)
TreeEnsembleModel classifier with 3 trees

  Tree 0:
    If (feature 2 <= 0.0)
     If (feature 0 <= 0.0)
      Predict: 0.0
     Else (feature 0 > 0.0)
      Predict: 1.0
    Else (feature 2 > 0.0)
     Predict: 0.0
  Tree 1:
    If (feature 2 <= 0.0)
     Predict: 1.0
    Else (feature 2 > 0.0)
     Predict: 0.0
  Tree 2:
    Predict: 1.0

参考
【1】http://spark.apache.org/docs/1.5.2/mllib-guide.html
【2】http://spark.apache.org/docs/1.5.2/programming-guide.html
【3】https://github.com/xubo245/SparkLearning

KeepLearningBigData

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark中组件Mllib的学习35之随机森林（entropy）进行分类

更多代码请见：https://github.com/xubo245/SparkLearning Spark中组件Mllib的学习之分类篇 1解释随机森林：RandomForest2.代码：/** * @author xubo * ref:Spark MlLib机器学习实战 * more code:https://github.com/xubo245
复制链接

扫一扫