这个例子开始从lda线性分类算法,最后引出决策树分类算法,不错,初学者可以参考下
网上的很多决策树算法都没有例子,都是就一堆代码都不知道参数怎么传递。直接用工具箱里面的决策树算法,不懂得就help一下就ok了。
Classification
Suppose you have a data set containing observations with measurements on different variables (called predictors) and their known class labels. If you obtain predictor values for new observations, could you determine to which classes those observations probably belong? This is the problem of classification. This demo illustrates how to perform some classification algorithms in MATLAB? using Statistics Toolbox? by applying them to Fisher's iris data.
Contents
Fisher's Iris Data
Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width for 150 iris specimens. There are 50 specimens from each of three species. Load the data and see how the sepal measurements differ between species. You can use the two columns containing sepal measurements.
load fisheriris gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');
N = size(meas,1);
Suppose you measure a sepal and petal from an iris, and you need to determine its species on the basis of those measurements. One approach to solving this problem is known as discriminant analysis.
Linear and Quadratic Discriminant Analysis
The classify function can perform classification using different types of discriminant analysis. First classify the data using the default linear discriminant analysis (LDA).
ldaClass = classify(meas(:,1:2),meas(:,1:2),species);
The observations with known class labels are usually called the training data. Now compute the resubstitution error, which is the misclassification error (the proportion of misclassified observations) on the training set.
bad = ~strcmp(ldaClass,species);
ldaResubErr = sum(bad) / N
ldaResubErr =
0.2000
You can also compute the confusion matrix on the training set. A confusion matrix contains information about known class labels and predicted class labels. Generally speaking, the (i,j) element in the confusion matrix is the number of samples whose known class label is class i and whose predicted class is j. The diagonal elements represent correctly classified observations.
[ldaResubCM,grpOrder] = confusionmat(species,ldaClass)
ldaResubCM =
49 1 0
0 36 14
0 15 35
grpOrder =
'setosa'
'versicolor'
'virginica'
Of the 150 training observations, 20% or 30 observations are misclassified by the linear discriminant function. You can see which ones they are by drawing X through the misclassified points.