视觉机器学习20讲-MATLAB源码示例（5）-随机森林（Random Forest）学习算法

mozun2020

已于 2022-12-28 12:46:29 修改

阅读量2.2k

点赞数 2

分类专栏： ML1:视觉机器learning20讲-MATLAB源码示例文章标签：计算机视觉图像处理 Matlab 随机森林 random forest

于 2022-04-11 00:30:00 首次发布

本文链接：https://blog.csdn.net/sinat_34897952/article/details/124083560

版权

ML1:视觉机器learning20讲-MATLAB源码示例专栏收录该内容

20 篇文章 41 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

本文介绍了随机森林算法的基本原理，以及在MATLAB中的仿真过程。通过MATLAB代码示例，展示了如何运用随机森林进行分类。此外，还强调了随机森林在大数据集和高维度数据上的优势，并提供了一系列视觉机器学习的MATLAB源码示例。

摘要由CSDN通过智能技术生成

视觉机器学习20讲-MATLAB源码示例（5）-随机森林（Random Forest）学习算法

1. 随机森林（Random Forest）学习算法
2. Matlab仿真
3. 仿真结果
4. 小结

1. 随机森林（Random Forest）学习算法

随机森林是一种一种分类算法，属于集成学习中的Bagging算法，即引导聚合类算法，由于不专注于解决困难样本，所以模型的performance会受到限制。在学习随机森林算法之前，首先要弄懂三个概念：决策树；集成学习（Ensemble Learning）[多分类系统]；自主采样法（Boostrap Sampling）。

随机森林是一个包含多个决策树的分类器，并且其输出的类别是由个别树输出的类别的众数而定。随机森林属于机器学习的一大分支——集成学习（EnsembleLearning）方法。随机森林具有对于很多种资料，可以产生高准确度的分类器；可以处理大量的输入变数；可以在决定类别时，评估变数的重要性；可以在内部对于一般化后的误差产生不偏差的估计；对于不平衡的分类资料集来说，可以平衡误差等优点。

2. Matlab仿真

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%功能：演示随机森林算法在计算机视觉中的应用
%环境：Win7，Matlab2018a
%Modi: C.S
%时间：2022-4-5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% path = ['E:\works\book\7（机器学习20讲）\Code\5、Random Forest\'];
path = ['F:\Learning\线上\csdn\视觉机器学习20讲\5、Random Forest\'];
data1 = textread([path 'satimage.tra']);
data2 = textread([path 'satimage.txt']);
% path = 'C:\Users\Administrator\Documents\MATLAB\';
% data = textread(path + 'srbct.txt');
% In this data set, each row represents a sample, 
% and each column represents a kind of variable(feature, attribute).
% !! So we should transpose "x" and "xts" below.
[m1, n1] = size(data1);
[m2, n2] = size(data2);  
ntest = m2;  % The number of test set;
ntrain = m1; % The number of training set;
% Above lines we randomly select 2/3 data as training data, 
% and remaining 1/3 data as test data.
x = (data1(1 : ntrain, 1 : n1 - 1));
x = x';
cl = (data1(1 : ntrain, n1));
xts = (data2(1 : ntest, 1 : n2 - 1));
xts = xts';
clts = (data2(1 : ntest, n2));
% Above lines we acquire x, cl, xts and clts from randomData;
nclass = 6;
% The data set has 4 classes.
classwt = 0;
% Here we set all class the same weight 1.
% It can also be written as "classwt = [1 1 1 1];".
cat0 = 0;
% Here we set it having no categorical variables.
runParam = [6 1 50 10 1 0];
% Here we set mtry = 80, ndsize = 1, jbt = 60, look = 10, lookcls = 1, mdim2nd = 0;
impOpt = [0 0 0];
% Here we set imp = 0, Interact = 0, impn = 0;
proCom = [0 0 0 0 0];
% Here we set nprox = 0, nrnn = 0, noutlier = 0, nscale = 0, nprot = 0;
missingVal = 0;
% Here we set missingVal = 0, that means we use the "Default Value" for missingVal.
% That is, code = -999.0, missingfill = 0;
saveForest = [0 0 0];
% Here we set isaverf = 0, isavepar = 0, isavefill = 0;
runForest = [0 0];
% Here we set irunrf = 0, ireadpar = 0;
outParam = [1,0,0,0,0,0,0,0,0,0];
% Here we set isumout = 1 to show a classification summary.
msm = 1 : 36;
% Here we use all 2308 variables, we can also use msm = 0 to use all variables.
seed = 4351;


x = single(x);        %get train x
cl = int32(cl);           %get train label
xts = single(xts);      %get test x
clts = int32(clts);     %get test label
classwt = single(classwt);
cat0 = int32(cat0);
msm = int32(msm);
runParam = int32(runParam);
impOpt = int32(impOpt);
proCom = int32(proCom);
missingVal = single(missingVal);
saveForest = int32(saveForest);
runForest = int32(runForest);
outParam = int32(outParam);
seed = int32(seed);

[errtr, errts, prox, trees, predictts, varimp, scale] = ...
  RF(nclass, x, cl, xts, clts, classwt, cat0, msm, runParam, impOpt, ...
    proCom, missingVal, saveForest, runForest, outParam, seed, 'satimage');

3. 仿真结果

>> main

* Class counts - training data
Class:          1         2         3         4         5         6
Counts:      1072       479       961       415       470      1038

* Class counts - test data
Class:          1         2         3         4         5         6
counts:       461       224       397       211       237       470

* Out of bag error:
       jbt   overall         1         2         3         4         5         6
train:  10     14.09      3.36      5.01      7.80     47.95     21.91     18.11
test:   10     10.35      1.08      1.34      6.05     36.49     14.35     13.62
train:  20     10.64      2.71      3.76      4.27     42.41     14.47     13.49
test:   20      9.35      0.65      2.23      6.05     35.55     10.55     11.70
train:  30     10.30      2.99      3.13      4.37     43.86     13.40     11.85
test:   30      9.10      0.43      2.23      6.05     35.55     10.55     10.85
train:  40     10.26      3.08      3.13      3.95     42.41     13.62     12.43
test:   40      9.20      0.65      3.13      6.05     35.55     10.55     10.64
train:  50      9.88      3.17      3.13      3.85     42.65     13.40     10.79
test:   50      9.20      0.87      3.13      6.05     35.55     10.55     10.43

* Summary output:

  final error rate:  9.88%
  final error test:  9.20%

  Training set confusion matrix (OOB):
		true class
           1     2     3     4     5     6
     1  1038     1     3     7    25     0
     2     2   464     1     3     5     2
     3    21     0   924    87     0    22
     4     0     5    20   238     3    69
     5    10     5     1     4   407    19
     6     1     4    12    76    30   926

  Test set confusion matrix:
		true class
           1     2     3     4     5     6
     1   457     0     3     0     5     0
     2     0   217     1     1     3     0
     3     2     1   373    32     1    11
     4     0     1    13   136     1    30
     5     2     3     1     3   212     8
     6     0     2     6    39    15   421

* RF all done!!!

4. 小结

随机森林方法的优点就是：

（1）在数据集上表现良好，相对于其他算法有较大的优势

（2）易于并行化，在大数据集上有很大的优势；

（3）能够处理高维度数据，不用做特征选择。

一般深度学习的课程中，随机森林都会在其中占有一席之地，对随机森林算法感兴趣的同学，推荐去仔细查看全文《机器学习20讲》中第五讲内容，源码在分享的资源中已打包好（这份源码有调用到一个封装的库，必须是32位的matlab才能运行成功，所以我也是特地安装了一个32位matlab才跑通这个例程），欢迎取用。

mozun2020

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
0
评论
视觉机器学习20讲-MATLAB源码示例（5）-随机森林（Random Forest）学习算法

视觉机器学习20讲-MATLAB源码示例（5）-随机森林（Random Forest）学习算法1. 随机森林（Random Forest）学习算法2. Matlab仿真3. 仿真结果4. 小结1. 随机森林（Random Forest）学习算法随机森林是一种一种分类算法，属于集成学习中的Bagging算法，即引导聚合类算法，由于不专注于解决困难样本，所以模型的performance会受到限制。在学习随机森林算法之前，首先要弄懂三个概念：决策树；集成学习（Ensemble Learning）[多分类系统]
复制链接

扫一扫