matlab cross函数,matlab crossval cross validation

最新推荐文章于 2023-11-15 21:21:33 发布

weixin_39640444

最新推荐文章于 2023-11-15 21:21:33 发布

阅读量1.2k

点赞数

文章标签： matlab cross函数

crossval -

Loss estimate using

cross-validation

Syntax

vals = crossval(fun,X)

vals = crossval(fun,X,Y,...)

mse = crossval('mse',X,y,'Predfun',predfun)

mcr = crossval('mcr',X,y,'Predfun',predfun)

val =

crossval(criterion,X1,X2,...,y,'Predfun',predfun)

vals =

crossval(...,'name',value)

Description

vals = crossval(fun,X) performs 10-fold

cross-validation for the function fun, applied to the data

in X.

fun is a function

handle to a function with two inputs, the training subset of

X, XTRAIN, and the test subset of X,

XTEST, as follows:

testval = fun(XTRAIN,XTEST)

Each time it is called, fun should use XTRAIN

to fit a model, then return some criterion testval

computed on XTEST using that fitted model.

X can be a column vector or a matrix. Rows of

X correspond to observations; columns correspond to

variables or features. Each row of vals contains the

result of applying fun to one test set. If

testval is a non-scalar value, crossval converts

it to a row vector using linear indexing and stored in one row of

vals.

vals = crossval(fun,X,Y,...) is used when data are

stored in separate variables X, Y, ... . All

variables (column vectors, matrices, or arrays) must have the same

number of rows. fun is called with the training subsets of

X, Y, ... , followed by the test subsets of

X, Y, ... , as follows:

testvals = fun(XTRAIN,YTRAIN,...,XTEST,YTEST,...)

mse = crossval('mse',X,y,'Predfun',predfun) returns

mse, a scalar containing a 10-fold cross-validation

estimate of mean-squared error for the function predfun.

X can be a column vector, matrix, or array of predictors.

y is a column vector of response values. X and

y must have the same number of rows.

predfun is a function

handle called with the training subset of X, the

training subset of y, and the test subset of X as

follows:

yfit = predfun(XTRAIN,ytrain,XTEST)

Each time it is called, predfun should use

XTRAIN and ytrain to fit a regression model and

then return fitted values in a column vector yfit. Each

row of yfit contains the predicted values for the

corresponding row of XTEST. crossval computes the

squared errors between yfit and the corresponding response

test set, and returns the overall mean across all test sets.

mcr = crossval('mcr',X,y,'Predfun',predfun) returns

mcr, a scalar containing a 10-fold cross-validation

estimate of misclassification rate (the proportion of misclassified

samples) for the function predfun. The matrix X

contains predictor values and the vector y contains class

labels. predfun should use XTRAIN and

YTRAIN to fit a classification model and return

yfit as the predicted class labels for XTEST.

crossval computes the number of misclassifications between

yfit and the corresponding response test set, and returns

the overall misclassification rate across all test sets.

val =

crossval(criterion,X1,X2,...,y,'Predfun',predfun),

where criterion is 'mse' or

'mcr', returns a cross-validation estimate of mean-squared

error (for a regression model) or misclassification rate (for a

classification model) with predictor values in X1,

X2, ... and, respectively, response values or class labels

in y. X1, X2, ... and y must

have the same number of rows. predfun is a function

handle called with the training subsets of X1,

X2, ..., the training subset of y, and the test

subsets of X1, X2, ..., as follows:

yfit=predfun(X1TRAIN,X2TRAIN,...,ytrain,X1TEST,X2TEST,...)

yfit should be a column vector containing the fitted

values.

vals =

crossval(...,'name',value)

specifies one or more optional parameter name/value pairs from the

following table. Specify name inside single

quotes.

Name

Value

holdout

A scalar specifying the ratio or the number of observations

p for holdout cross-validation. When 0

< p < 1,

approximately p*n observations for the test set are

randomly selected. When p is an integer, p

observations for the test set are randomly selected.

kfold

A scalar specifying the number of folds k for

k-fold cross-validation.

leaveout

Specifies leave-one-out cross-validation. The value must be

mcreps

A positive integer specifying the number of Monte-Carlo

repetitions for validation. Ifthe first input of crossval

is 'mse' or 'mcr', crossval returns the

mean of mean-squared error or misclassification rate across all of

the Monte-Carlo repetitions. Otherwise, crossval

concatenates the values vals from all of the Monte-Carlo

repetitions along the first dimension.

partition

An object c of the cvpartition class, specifying the cross-validation type

and partition.

stratify

A column vector group specifying groups for

stratification. Both training and test sets have roughly the same

class proportions as in group. NaNs or empty

strings in group are treated as missing values, and the

corresponding rows of the data are ignored.

options

A structure that specifies whether to run in parallel, and

specifies the random stream or streams. Create the options

structure with statset.

Option fields:

UseParallel — Set to 'always' to compute in

parallel. Default is 'never'.

UseSubstreams — Set to 'always' to compute in

parallel in a reproducible fashion. Default is 'never'. To

compute reproducibly, set Streams to a type allowing

substreams: 'mlfg6331_64' or 'mrg32k3a'.

Streams — A RandStream

object or cell array consisting of one such object. If you do not

specify Streams, crossval uses the default

stream.

For more information on using parallel computing, see Parallel

Statistics.

Only one of kfold, holdout, leaveout,

or partition can be specified, and partition

cannot be specified with stratify. If both

partition and mcreps are specified, the first

Monte-Carlo repetition uses the partition information in the

cvpartition object, and the repartition method is called to generate new

partitions for each of the remaining repetitions. If no

cross-validation type is specified, the default is 10-fold

cross-validation.

Note When using

cross-validation with classification algorithms, stratification is

preferred. Otherwise, some test sets may not include observations

from all classes.

Examples

Example 1

Compute mean-squared error for regression using 10-fold

cross-validation:

load('fisheriris');

y = meas(:,1);

X = [ones(size(y,1),1),meas(:,2:4)];

regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN));

cvMse = crossval('mse',X,y,'predfun',regf)

cvMse =

0.1015

Example 2

Compute misclassification rate using stratified 10-fold

cross-validation:

load('fisheriris');

y = species;

X = meas;

cp = cvpartition(y,'k',10); % Stratified cross-validation

classf = @(XTRAIN, ytrain,XTEST)(classify(XTEST,XTRAIN,...

ytrain));

cvMCR = crossval('mcr',X,y,'predfun',classf,'partition',cp)

cvMCR =

0.0200

Example 3

Compute the confusion matrix using stratified 10-fold

cross-validation:

load('fisheriris');

y = species;

X = meas;

order = unique(y); % Order of the group labels

cp = cvpartition(y,'k',10); % Stratified cross-validation

f = @(xtr,ytr,xte,yte)confusionmat(yte,...

classify(xte,xtr,ytr),'order',order);

cfMat = crossval(f,X,y,'partition',cp);

cfMat = reshape(sum(cfMat),3,3)

cfMat =

50 0 0

0 48 2

0 1 49

cfMat is the summation of 10 confusion matrices from 10

test sets.

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New

York: Springer, 2001.