matlab NCA,Neighborhood Component Analysis (NCA) Feature Selection

NCA Feature Selection for Classification

Consider a multi-class classification problem with a training set containing n observations:

S={(xi,yi),i=1,2,…,n},

where xi∈ℝp are the feature vectors, yi∈{1,2,…,c} are the class labels, and c is the number of classes. The aim is to learn a classifier f:ℝp→{1,2,…,c} that accepts a feature vector and makes a prediction f(x) for the true label y of x.

Consider a randomized classifier that:

Randomly picks a point, Ref(x), from S as the ‘reference point’ for x

Labels x using the label of the reference point Ref(x).

This scheme is similar to that of a 1-NN classifier where the reference point is chosen to be the nearest neighbor of the new point x. In NCA, the reference point is chosen randomly and all points in S have some probability of being selected as the reference point. The probability P(Ref(x)=xj|S) that point xj is picked from S as the reference point for x is higher if xj is closer to x as measured by the distance function dw, where

dw(xi,xj)=∑r=1pwr2|xir−xjr|,

and wr are the feature weights. Assume that

P(Ref(x)=xj|S)∝k(dw(x,xj)),

where k is some kernel or a similarity function that assumes large values when dw(x,xj) is small. Suppose it is

k(z)=exp(−zσ),

as suggested in [1]. The reference point for x is chosen from S, so sum of P(Ref(x)=xj|S) for all j must be equal to 1. Therefore, it is possible to write

P(Ref(x)=xj|S)=k(dw(x,xj))∑j=1nk(dw(x,xj)).

Now consider the leave-one-out application of this randomized classifier, that is, predicting the label of xi using the data in S−i, the training set S excluding the point (xi,yi). The probability that point xj is picked as the reference point for xi is

pij=P(Ref(xi)=xj|S−i)=k(dw(xi,xj))∑j=1,j≠ink(dw(xi,xj)).

The average leave-one-out probability of correct classification is the probability pi that the randomized classifier correctly classifies observation i using S−i.

pi=∑j=1,j≠inP(Ref(xi)=xj|S−i)I(yi=yj)=∑j=1,j≠inpijyij,

where

yij=I(yi=yj)={1ifyi=yj,0otherwise.

The average leave-one-out probability of correct classification using the randomized classifier can be written as

F(w)=1n∑i=1npi.

The right hand side of F(w) depends on the weight vector w. The goal of neighborhood component analysis is to maximize F(w) with respect to w. fscnca uses the regularized objective function as introduced in [1].

F(w)=1n∑i=1npi−λ∑r=1pwr2=1n∑i=1n[∑j=1,j≠inpijyij−λ∑r=1pwr2]︸Fi(w)=1n∑i=1nFi(w),

where λ is the regularization parameter. The regularization term drives many of the weights in w to 0.

After choosing the kernel parameter σ in pij as 1, finding the weight vector w can be expressed as the following minimization problem for given λ.

w^=argminwf(w)=argminw1n∑i=1nfi(w),

where f(w) = -F(w) and fi(w) = -Fi(w).

Note that

1n∑i=1n∑j=1,j≠inpij=1,

and the argument of the minimum does not change if you add a constant to an objective function. Therefore, you can rewrite the objective function by adding the constant 1.

w^=argminw{1+f(w)}=argminw{1n∑i=1n∑j=1,j≠inpij−1n∑i=1n∑j=1,j≠inpijyij+λ∑r=1pwr2}=argminw{1n∑i=1n∑j=1,j≠inpij(1−yij)+λ∑r=1pwr2}=argminw{1n∑i=1n∑j=1,j≠inpijl(yi,yj)+λ∑r=1pwr2},

where the loss function is defined as

l(yi,yj)={1ifyi≠yj,0otherwise.

The argument of the minimum is the weight vector that minimizes the classification error. You can specify a custom loss function using the LossFunction name-value pair argument in the call to fscnca.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值