sequentialfs matlab,新手求教matlab的 sequentialfs函数用法

MollyBox魔力猫盒

于 2021-03-17 13:33:23 发布

阅读量395

点赞数

文章标签： sequentialfs matlab

满意答案

云姐a455

2016.01.11

采纳率：43% 等级：8

已帮助：1110人

sequentialfs Sequential feature selection.

INMODEL = sequentialfs(FUN,X,Y) selects a subset of features from X

that best predict the data in Y, by sequentially selecting features

until there is no improvement in prediction. X is a data matrix whose

rows correspond to points (or observations) and whose columns

correspond to features (or predictor variables). Y is a column vector

of response values or class labels for each observations in X. X and Y

must have the same number of rows. FUN is a function handle, created

using @, that defines the criterion that sequentialfs uses to select

features and to determine when to stop. sequentialfs returns INMODEL, a

logical vector indicating which features are finally chosen.

Starting from an empty feature set, sequentialfs creates candidate

feature subsets by adding in each of the features not yet selected. For

each candidate feature subset, sequentialfs performs 10-fold

cross-validation by repeatedly calling FUN with different training and

test subsets of X and Y, as follows:

CRITERION = FUN(XTRAIN,YTRAIN,XTEST,YTEST)

XTRAIN and YTRAIN contain the same subset of rows of X and Y, while

XTEST and YTEST contain the complementary subset of rows. XTRAIN and

XTEST contain the data taken from the columns of X that correspond to

the current candidate feature set.

Each time it is called, FUN must return a scalar value CRITERION.

Typically, FUN uses XTRAIN and YTRAIN to train or fit a model, then

predicts values for XTEST using that model, and finally returns some

measure of distance or loss of those predicted values from YTEST. In

the cross-validation calculation for a given candidate feature set,

sequentialfs sums the values returned by FUN across all test sets, and

divides that sum by the total number of test observations. It then uses

that mean value to evaluate each candidate feature subset. Two commonly

used loss measures for FUN are the sum of squared errors for regression

models (sequentialfs computes the mean squared error in this case), and

the number of misclassified observations for classification models

(sequentialfs computes the misclassification rate in this case).

Note: sequentialfs divides the sum of the values returned by FUN across

all test sets by the total number of test observations, therefore FUN

should not divide its output value by the number of test observations.

Given the mean CRITERION values for each candidate feature subset,

sequentialfs chooses the one that minimizes the mean CRITERION value.

This process continues until adding more features does not decrease the

criterion.

INMODEL = sequentialfs(FUN,X,Y,Z,...) allows any number of input

variables X, Y, Z, ... . sequentialfs chooses features (columns) only

from X, but otherwise imposes no interpretation on X, Y, Z, ... .

All data inputs, whether column vectors or matrices, must have the same

number of rows. sequentialfs calls FUN with training and test subsets

of X, Y, Z, ..., as follows:

CRITERION = FUN(XTRAIN,YTRAIN,ZTRAIN,...,XTEST,YTEST,ZTEST,...)

sequentialfs creates XTRAIN, YTRAIN, ZTRAIN, ... and XTEST, YTEST,

ZTEST, ... by selecting subsets of the rows of X, Y, Z, ... . FUN must

return a scalar value CRITERION, but may compute that value in any way.

Elements of the logical vector INMODEL correspond to columns of X, and

indicate which features are finally chosen.

[INMODEL,HISTORY] = sequentialfs(FUN,X,...) returns information on

which feature is chosen in each step. HISTORY is a scalar structure

with the following fields:

Crit A vector containing the criterion values computed at each

step.

In A logical matrix in which row I indicates which features

are included at step I.

[...] = sequentialfs(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies

one or more of the following name/value pairs:

'CV' The validation method used to compute the criterion for

each candidate feature subset. Choices are:

a positive integer K - Use K-fold cross-validation (without

stratification). K should be greater

than one.

a CVPARTITION object - Perform cross-validation specified

by the CVPARTITION object.

'resubstitution' - Use resubstitution, i.e., the

original data are passed

to FUN as both the training and test

data to compute the criterion.

'none' - Call FUN as CRITERION =

FUN(X,Y,Z,...), without separating

test and training sets.

The default value of 'CV' is 10, i.e., 10-fold

cross-validation (without stratification).

So-called "wrapper" methods use a function FUN that

implements a learning algorithm. These methods usually

apply cross-validation to select features. So-called

"filter" methods use a function that measures the

characteristics (such as correlation) of the data to select

features.

'MCReps' A positive integer indicating the number of Monte-Carlo

repetitions for cross-validation. The default value is 1.

'MCReps' must be 1 if 'CV' is 'none' or 'resubstitution'.

'Direction' The direction in which to perform the sequential search.

The default is 'forward'. If 'Direction' is 'backward',

sequentialfs begins with a feature set including all

features and removes features sequentially until the

criterion increases.

'KeepIn' A logical vector, or a vector of column numbers, specifying a

set of features which must be included. The default is

empty.

'KeepOut' A logical vector, or a vector of column numbers, specifying a

set of features which must be excluded. The default is

empty.

'NFeatures' The number of features at which sequentialfs should stop.

INMODEL includes exactly this many features. The default

value is empty, indicating that sequentialfs should stop

when a local minimum of the criterion is found. A

non-empty value for 'NFeatures' overrides 'MaxIter' and

'TolFun' in 'Options'.

'NullModel' A logical value, indicating whether or not the null model

(containing no features from X) should be included in the

feature selection procedure and in the HISTORY output. The

default is FALSE.

'Options' Options structure for the iterative sequential search

algorithm, as created by STATSET. sequentialfs uses the

following fields:

'Display' Level of display output. Choices are 'off' (the

default), 'final', and 'iter'.

'MaxIter' Maximum number of steps allowed. The default is

Inf.

'TolFun' Positive number giving the termination tolerance

for the criterion. The default is 1e-6 if

'Direction' is 'forward', or 0 if 'Direction' is

'backward'.

'TolTypeFun' 'abs', to use 'TolFun' as an absolute tolerance, or

'rel', to use it as a relative tolerance. The

default is 'rel'.

'UseParallel'

'UseSubStreams'

'Streams' These fields specify whether to perform cross-

validation computations in parallel, and how to use

random numbers during cross-validation.

For information on these fields see PARALLELSTATS.

NOTE: If supplied, 'Streams' must be of length one.

Examples:

% Perform sequential feature selection for CLASSIFY on iris data with

% noisy features and see which non-noise features are important

load('fisheriris');

X = randn(150,10);

X(:,[1 3 5 7 ])= meas;

y = species;

opt = statset('display','iter');

% Generating a stratified partition is usually preferred to

% evaluate classification algorithms.

cvp = cvpartition(y,'k',10);

[fs,history] = sequentialfs(@classf,X,y,'cv',cvp,'options',opt);

where CLASSF is a MATLAB function such as:

function err = classf(xtrain,ytrain,xtest,ytest)

yfit = classify(xtest,xtrain,ytrain,'quadratic');

err = sum(~strcmp(ytest,yfit));

01分享举报

MollyBox魔力猫盒

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sequentialfs matlab,新手求教matlab的 sequentialfs函数用法

满意答案云姐a4552016.01.11采纳率：43%等级：8已帮助：1110人sequentialfs Sequential feature selection.INMODEL = sequentialfs(FUN,X,Y) selects a subset of features from Xthat best predict the data in Y, by sequentiall...
复制链接

扫一扫