matlab probability,Matlab: Discriminant Analysis

Improve a Discriminant Analysis

Classifier

Deal with Singular Data

Discriminant analysis needs data sufficient to fit Gaussian

models with invertible covariance matrices. If your data is not

sufficient to fit such a model uniquely, fitcdiscr

fails. This section shows methods for handling failures.

Tip To obtain a

discriminant analysis classifier without failure, set the

DiscrimType name-value pair to

'pseudoLinear' or 'pseudoQuadratic' in

fitcdiscr.

"Pseudo" discriminants never fail, because they use the

pseudoinverse of the covariance matrix Σk (see

pinv).

Example:

Singular Covariance

Matrix.When the covariance

matrix of the fitted classifier is singular, fitcdiscr

can fail:

load popcorn

X = popcorn(:,[1 2]);

X(:,3) = 0; % a zero-variance column

Y = popcorn(:,3);

ppcrn = fitcdiscr(X,Y);

Error using ClassificationDiscriminant (line 635)

Predictor x3 has zero variance. Either exclude this predictor or set 'discrimType' to

'pseudoLinear' or 'diagLinear'.

Error in classreg.learning.FitTemplate/fit (line 243)

obj = this.MakeFitObject(X,Y,W,this.ModelParameters,fitArgs{:});

Error in fitcdiscr (line 296)

this = fit(temp,X,Y);

To proceed with linear discriminant analysis, use a

pseudoLinear or diagLinear discriminant

type:

ppcrn = fitcdiscr(X,Y,...

'discrimType','pseudoLinear');

meanpredict = predict(ppcrn,mean(X))

meanpredict =

3.5000

Choose a Discriminant Type

There are six types of discriminant analysis classifiers: linear

and quadratic, with diagonal and pseudo variants of

each type.

Tip To see if your

covariance matrix is singular, set discrimType to

'linear' or 'quadratic'. If the matrix is

singular, the fitcdiscr method fails for

'quadratic', and the Gamma property is

nonzero for 'linear'.

To obtain a quadratic classifier even when your covariance

matrix is singular, set DiscrimType to

'pseudoQuadratic' or 'diagQuadratic'.

obj = fitcdiscr(X,Y,'DiscrimType','pseudoQuadratic') % or 'diagQuadratic'

Choose a classifier type by setting the discrimType

name-value pair to one of:

'linear' (default) — Estimate one covariance matrix

for all classes.

'quadratic' — Estimate one covariance matrix for

each class.

'diagLinear' — Use the diagonal of the

'linear' covariance matrix, and use its pseudoinverse

if necessary.

'diagQuadratic' — Use the diagonals of the

'quadratic' covariance matrices, and use their

pseudoinverses if necessary.

'pseudoLinear' — Use the pseudoinverse of the

'linear' covariance matrix if necessary.

'pseudoQuadratic' — Use the pseudoinverses of the

'quadratic' covariance matrices if necessary.

fitcdiscr

can fail for the 'linear' and 'quadratic'

classifiers. When it fails, it returns an explanation, as shown in

Deal with Singular Data.

fitcdiscr

always succeeds with the diagonal and pseudo variants. For

information about pseudoinverses, see pinv.

You can set the discriminant type using dot notation after

constructing a classifier:

obj.DiscrimType = 'discrimType'

You can change between linear types or between quadratic types,

but cannot change between a linear and a quadratic type.

Examine the Resubstitution Error

and Confusion Matrix

The resubstitution error is the difference between the

response training data and the predictions the classifier makes of

the response based on the input training data. If the

resubstitution error is high, you cannot expect the predictions of

the classifier to be good. However, having low resubstitution error

does not guarantee good predictions for new data. Resubstitution

error is often an overly optimistic estimate of the predictive

error on new data.

The confusion matrix shows how many errors, and which

types, arise in resubstitution. When there are K

classes, the confusion matrix R is a

K-by-K matrix with

R(i,j) = the number of

observations of class i that the classifier predicts

to be of class j.

Example:

Resubstitution Error of a Discriminant Analysis

Classifier.Examine the

resubstitution error of the default discriminant analysis

classifier for the Fisher iris data:

load fisheriris

obj = fitcdiscr(meas,species);

resuberror = resubLoss(obj)

resuberror =

0.0200

The resubstitution error is very low, meaning obj

classifies nearly all the Fisher iris data correctly. The total

number of misclassifications is:

resuberror * obj.NumObservations

ans =

3.0000

To see the details of the three misclassifications, examine the

confusion matrix:

R = confusionmat(obj.Y,resubPredict(obj))

R =

50 0 0

0 48 2

0 1 49

obj.ClassNames

ans =

'setosa'

'versicolor'

'virginica'

R(1,:) = [50 0 0] means obj classifies

all 50 setosa irises correctly.

R(2,:) = [0 48 2] means obj classifies

48 versicolor irises correctly, and misclassifies two versicolor

irises as virginica.

R(3,:) = [0 1 49] means obj classifies

49 virginica irises correctly, and misclassifies one virginica iris

as versicolor.

Cross Validation

Typically, discriminant analysis classifiers are robust and do

not exhibit overtraining when the number of predictors is much less

than the number of observations. Nevertheless, it is good practice

to cross validate your classifier to ensure its stability.

Cross

Validating a Discriminant Analysis

Classifier

This example shows how to perform five-fold cross validation of

a quadratic discriminant analysis classifier.

Load the sample data.

load fisheriris

Create a quadratic discriminant analysis classifier for the

data.

quadisc = fitcdiscr(meas,species,'DiscrimType','quadratic');

Find the resubstitution error of the classifier.

qerror = resubLoss(quadisc)

qerror =

0.0200

The classifier does an excellent job. Nevertheless,

resubstitution error can be an optimistic estimate of the error

when classifying new data. So proceed to cross validation.

Create a cross-validation model.

cvmodel = crossval(quadisc,'kfold',5);

Find the cross-validation loss for the model, meaning the error

of the out-of-fold observations.

cverror = kfoldLoss(cvmodel)

cverror =

0.0200

The cross-validated loss is as low as the original

resubstitution loss. Therefore, you can have confidence that the

classifier is reasonably accurate.

Change Costs and Priors

Sometimes you want to avoid certain misclassification errors

more than others. For example, it might be better to have

oversensitive cancer detection instead of undersensitive cancer

detection. Oversensitive detection gives more false positives

(unnecessary testing or treatment). Undersensitive detection gives

more false negatives (preventable illnesses or deaths). The

consequences of underdetection can be high. Therefore, you might

want to set costs to reflect the consequences.

Similarly, the training data Y can have a

distribution of classes that does not represent their true

frequency. If you have a better estimate of the true frequency, you

can include this knowledge in the classification Prior

property.

Example:

Setting Custom Misclassification

Costs.Consider the Fisher

iris data. Suppose that the cost of classifying a versicolor iris

as virginica is 10 times as large as making any other

classification error. Create a classifier from the data, then

incorporate this cost and then view the resulting classifier.

Load the Fisher iris data and create a default (linear)

classifier as in Example: Resubstitution Error of a Discriminant Analysis

Classifier:

load fisheriris

obj = fitcdiscr(meas,species);

resuberror = resubLoss(obj)

resuberror =

0.0200

R = confusionmat(obj.Y,resubPredict(obj))

R =

50 0 0

0 48 2

0 1 49

obj.ClassNames

ans =

'setosa'

'versicolor'

'virginica'

R(2,:) = [0 48 2] means obj classifies

48 versicolor irises correctly, and misclassifies two versicolor

irises as virginica.

Change the cost matrix to make fewer mistakes in classifying

versicolor irises as virginica:

obj.Cost(2,3) = 10;

R2 = confusionmat(obj.Y,resubPredict(obj))

R2 =

50 0 0

0 50 0

0 7 43

obj now classifies all versicolor irises correctly,

at the expense of increasing the number of misclassifications of

virginica irises from 1 to 7.

Example:

Setting Alternative

Priors.Consider the Fisher

iris data. There are 50 irises of each kind in the data. Suppose

that, in a particular region, you have historical data that shows

virginica are five times as prevalent as the other kinds. Create a

classifier that incorporates this information.

Load the Fisher iris data and make a default (linear) classifier

as in Example: Resubstitution Error of a Discriminant Analysis

Classifier:

load fisheriris

obj = fitcdiscr(meas,species);

resuberror = resubLoss(obj)

resuberror =

0.0200

R = confusionmat(obj.Y,resubPredict(obj))

R =

50 0 0

0 48 2

0 1 49

obj.ClassNames

ans =

'setosa'

'versicolor'

'virginica'

R(3,:) = [0 1 49] means obj classifies

49 virginica irises correctly, and misclassifies one virginica iris

as versicolor.

Change the prior to match your historical data, and examine the

confusion matrix of the new classifier:

obj.Prior = [1 1 5];

R2 = confusionmat(obj.Y,resubPredict(obj))

R2 =

50 0 0

0 46 4

0 0 50

The new classifier classifies all virginica irises correctly, at

the expense of increasing the number of misclassifications of

versicolor irises from 2 to 4.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值