sequentialfs matlab,新手求教matlab的 sequentialfs函数用法

满意答案

dcebd7a0de6265b6ccae5ead692f1eab.png

云姐a455

2016.01.11

dcebd7a0de6265b6ccae5ead692f1eab.png

采纳率:43%    等级:8

已帮助:1110人

sequentialfs Sequential feature selection.

INMODEL = sequentialfs(FUN,X,Y) selects a subset of features from X

that best predict the data in Y, by sequentially selecting features

until there is no improvement in prediction. X is a data matrix whose

rows correspond to points (or observations) and whose columns

correspond to features (or predictor variables). Y is a column vector

of response values or class labels for each observations in X. X and Y

must have the same number of rows. FUN is a function handle, created

using @, that defines the criterion that sequentialfs uses to select

features and to determine when to stop. sequentialfs returns INMODEL, a

logical vector indicating which features are finally chosen.

Starting from an empty feature set, sequentialfs creates candidate

feature subsets by adding in each of the features not yet selected. For

each candidate feature subset, sequentialfs performs 10-fold

cross-validation by repeatedly calling FUN with different training and

test subsets of X and Y, as follows:

CRITERION = FUN(XTRAIN,YTRAIN,XTEST,YTEST)

XTRAIN and YTRAIN contain the same subset of rows of X and Y, while

XTEST and YTEST contain the complementary subset of rows. XTRAIN and

XTEST contain the data taken from the columns of X that correspond to

the current candidate feature set.

Each time it is called, FUN must return a scalar value CRITERION.

Typically, FUN uses XTRAIN and YTRAIN to train or fit a model, then

predicts values for XTEST using that model, and finally returns some

measure of distance or loss of those predicted values from YTEST. In

the cross-validation calculation for a given candidate feature set,

sequentialfs sums the values returned by FUN across all test sets, and

divides that sum by the total number of test observations. It then uses

that mean value to evaluate each candidate feature subset. Two commonly

used loss measures for FUN are the sum of squared errors for regression

models (sequentialfs computes the mean squared error in this case), and

the number of misclassified observations for classification models

(sequentialfs computes the misclassification rate in this case).

Note: sequentialfs divides the sum of the values returned by FUN across

all test sets by the total number of test observations, therefore FUN

should not divide its output value by the number of test observations.

Given the mean CRITERION values for each candidate feature subset,

sequentialfs chooses the one that minimizes the mean CRITERION value.

This process continues until adding more features does not decrease the

criterion.

INMODEL = sequentialfs(FUN,X,Y,Z,...) allows any number of input

variables X, Y, Z, ... . sequentialfs chooses features (columns) only

from X, but otherwise imposes no interpretation on X, Y, Z, ... .

All data inputs, whether column vectors or matrices, must have the same

number of rows. sequentialfs calls FUN with training and test subsets

of X, Y, Z, ..., as follows:

CRITERION = FUN(XTRAIN,YTRAIN,ZTRAIN,...,XTEST,YTEST,ZTEST,...)

sequentialfs creates XTRAIN, YTRAIN, ZTRAIN, ... and XTEST, YTEST,

ZTEST, ... by selecting subsets of the rows of X, Y, Z, ... . FUN must

return a scalar value CRITERION, but may compute that value in any way.

Elements of the logical vector INMODEL correspond to columns of X, and

indicate which features are finally chosen.

[INMODEL,HISTORY] = sequentialfs(FUN,X,...) returns information on

which feature is chosen in each step. HISTORY is a scalar structure

with the following fields:

Crit A vector containing the criterion values computed at each

step.

In A logical matrix in which row I indicates which features

are included at step I.

[...] = sequentialfs(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies

one or more of the following name/value pairs:

'CV' The validation method used to compute the criterion for

each candidate feature subset. Choices are:

a positive integer K - Use K-fold cross-validation (without

stratification). K should be greater

than one.

a CVPARTITION object - Perform cross-validation specified

by the CVPARTITION object.

'resubstitution' - Use resubstitution, i.e., the

original data are passed

to FUN as both the training and test

data to compute the criterion.

'none' - Call FUN as CRITERION =

FUN(X,Y,Z,...), without separating

test and training sets.

The default value of 'CV' is 10, i.e., 10-fold

cross-validation (without stratification).

So-called "wrapper" methods use a function FUN that

implements a learning algorithm. These methods usually

apply cross-validation to select features. So-called

"filter" methods use a function that measures the

characteristics (such as correlation) of the data to select

features.

'MCReps' A positive integer indicating the number of Monte-Carlo

repetitions for cross-validation. The default value is 1.

'MCReps' must be 1 if 'CV' is 'none' or 'resubstitution'.

'Direction' The direction in which to perform the sequential search.

The default is 'forward'. If 'Direction' is 'backward',

sequentialfs begins with a feature set including all

features and removes features sequentially until the

criterion increases.

'KeepIn' A logical vector, or a vector of column numbers, specifying a

set of features which must be included. The default is

empty.

'KeepOut' A logical vector, or a vector of column numbers, specifying a

set of features which must be excluded. The default is

empty.

'NFeatures' The number of features at which sequentialfs should stop.

INMODEL includes exactly this many features. The default

value is empty, indicating that sequentialfs should stop

when a local minimum of the criterion is found. A

non-empty value for 'NFeatures' overrides 'MaxIter' and

'TolFun' in 'Options'.

'NullModel' A logical value, indicating whether or not the null model

(containing no features from X) should be included in the

feature selection procedure and in the HISTORY output. The

default is FALSE.

'Options' Options structure for the iterative sequential search

algorithm, as created by STATSET. sequentialfs uses the

following fields:

'Display' Level of display output. Choices are 'off' (the

default), 'final', and 'iter'.

'MaxIter' Maximum number of steps allowed. The default is

Inf.

'TolFun' Positive number giving the termination tolerance

for the criterion. The default is 1e-6 if

'Direction' is 'forward', or 0 if 'Direction' is

'backward'.

'TolTypeFun' 'abs', to use 'TolFun' as an absolute tolerance, or

'rel', to use it as a relative tolerance. The

default is 'rel'.

'UseParallel'

'UseSubStreams'

'Streams' These fields specify whether to perform cross-

validation computations in parallel, and how to use

random numbers during cross-validation.

For information on these fields see PARALLELSTATS.

NOTE: If supplied, 'Streams' must be of length one.

Examples:

% Perform sequential feature selection for CLASSIFY on iris data with

% noisy features and see which non-noise features are important

load('fisheriris');

X = randn(150,10);

X(:,[1 3 5 7 ])= meas;

y = species;

opt = statset('display','iter');

% Generating a stratified partition is usually preferred to

% evaluate classification algorithms.

cvp = cvpartition(y,'k',10);

[fs,history] = sequentialfs(@classf,X,y,'cv',cvp,'options',opt);

where CLASSF is a MATLAB function such as:

function err = classf(xtrain,ytrain,xtest,ytest)

yfit = classify(xtest,xtrain,ytrain,'quadratic');

err = sum(~strcmp(ytest,yfit));

01分享举报

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值