终于来到Random Forests啦。随机森林应该不难理解,算法本身就不细说了,直接进入代码!
buildClassifer:
public void buildClassifier(Instances data) throws Exception {
// can classifier handle the data?
getCapabilities().testWithFail(data);
// remove instances with missing class
data = new Instances(data);
data.deleteWithMissingClass();
m_bagger = new Bagging();
RandomTree rTree = new RandomTree();
// set up the random tree options
m_KValue = m_numFeatures;
if (m_KValue < 1)
m_KValue = (int) Utils.log2(data.numAttributes()) + 1;
rTree.setKValue(m_KValue);
rTree.setMaxDepth(getMaxDepth());
// set up the bagger and build the forest
m_bagger.setClassifier(rTree);
m_bagger.setSeed(m_randomSeed);
m_bagger.setNumIterations(m_numTrees);
m_bagger.setCalcOutOfBag(true);
m_bagger.buildClassifier(data);
}
前三行再熟悉不过了。第四行, m_bagger初始化一个bagging类(其实random forests跟bagging区别的区别是base learner)。
RandomTree就是一棵随机树,后面讲(清楚随机森林的同学,已经大致猜到了这是棵怎么样的树)。
后面几部就是设置下参数而已。其实就跟bagging一模一样,只不过我们增加一些参数,并且把base learner换一换。
下面来看看随机森林的base learner - RandomTree。
buildClassifier:
public void buildClassifier(Instances data) throws Exception {
// Make sure K value is in range
// m_KValue: number of instances for spliting
if (m_KValue > data.numAttributes() - 1)
m_KValue = data.numAttribute