来自OpenCV2.3.1 sample/c/mushroom.cpp
1.首先读入agaricus-lepiota.data的训练样本。
样本中第一项是e或p代表有毒或无毒的标志位;其他是特征,可以把每个样本看做一个特征向量;
cvSeqPush( seq, el_ptr );读入序列seq中,每一项都存储一个样本即特征向量;
之后,把特征向量与标志位分别读入CvMat* data与CvMat* reponses中
还有一个CvMat* missing保留丢失位当前小于0位置;
2.训练样本
dtree = new CvDTree;
dtree->train( data, CV_ROW_SAMPLE, responses, 0, 0, var_type, missing,
CvDTreeParams( 8, // max depth
10, // min sample count 样本数小于10时,停止分裂
0, // regression accuracy: N/A here;回归树的限制精度
true, // compute surrogate split, as we have missing data;;为真时,计算missing data和变量的重要性
15, // max number of categories (use sub-optimal algorithm for larger numbers)类型上限以保证计算速度。树会以次优分裂(suboptimal split)的形式生长。只对2种取值以上的树有意义
10, // the number of cross-validation folds;If cv_folds > 1 then prune a tree with K-fold cross-validation where K is equal to cv_folds
true, // use 1SE rule => smaller tree;If true 修剪树. 这将使树更紧凑,更能抵抗训练数据噪声,但有点不太准确
true, // throw away the pruned tree branches
priors //错分类的代价我们判断的:有毒VS无毒 错误的代价比 the array of priors, the bigger p_weight, the more attention
// to the poisonous mushrooms
// (a mushroom will be judjed to be poisonous with bigger chance)
));
3.
double r = dtree->predict( &sample, &mask )->value;//使用predict来预测样本,结果为 CvDTreeNode结构,dtree->predict(sample,mask)->value是分类情况下的类别或回归情况下的函数估计值;
4.interactive_classification通过人工输入特征来判断。
#include "opencv2/core/core_c.h"
#include "opencv2/ml/ml.hpp"
#include <stdio.h>
void help()
{
printf("\nThis program demonstrated the use of OpenCV's decision tree function for learning and predicting data\n"
"Usage :\n"
"./mushroom <path to ag