Machine learning(3) Linear Discriminant Functions--Least-squares classification

Machine Learning(3)Least-squares classification


Chenjing Ding
2018/02/28


notationmeaning
Mthe number of mixture components
x_nn-th input vector
Nthe number of training input vectors
Kthe number of classes
wa vector of the weight matrix
Wweight matrix
Xinput metrix

To put it clearly, all vectors in this passage are column vector, the transpose of them are row vector; and all Capital letter represents matrix, otherwise it represents a vector.

1.General Classification Problem

1.1 one sample input case

Let’s consider K discriminant linear models:

yk(x)=wTkx+wk0,k=1...K(1.1.1) (1.1.1) y k ( x ) = w k T x + w k 0 , k = 1... K
Both wk w k and x x are vector. if W is a matrix as followed:(1.1.2)W=[w1,w2,wk]=[w10w20...wK0w11w21...wK1............w1Dw2D...wKD]
then we obtain Y(x) Y ( x ) which is a column vector,
Y(x)=WTx=[y1(x) y2(x) ...yK(x)]T(1.1.3) (1.1.3) Y ( x ) = W T x = [ y 1 ( x )   y 2 ( x )   . . . y K ( x ) ] T

1.2 input as a matrix

For entire data set, X is a matrix.

Yˆ(X)=XW Y ^ ( X ) = X W

X=[x1 x2 ... xN]T X = [ x 1   x 2   . . .   x N ] T

T=[t1 t2...tN]T,Yˆ(X)=[Y(x1) Y(x2) ...Y(xN)]T(1.2.1) (1.2.1) T = [ t 1   t 2... t N ] T , Y ^ ( X ) = [ Y ( x 1 )   Y ( x 2 )   . . . Y ( x N ) ] T

and t1,t2... t 1 , t 2 . . . is column vectors , T and Yˆ(X) Y ^ ( X ) are matrix, T is the target matrix ;

2. Closed-form solution

Try to find the closed-form solution of W, directly to minimize the sum-of-squares error:

E(W)=n=1Nk=1K(yk(xn)tnk)2=n=1Nk=1K(wTkxntnk)2 E ( W ) = ∑ n = 1 N ∑ k = 1 K ( y k ( x n ) − t n k ) 2 = ∑ n = 1 N ∑ k = 1 K ( w k T x n − t n k ) 2
Let’s formulate the sum-of-squares error in matrix notation:
ija2ij=Tr(ATA),Tr(A)A=I(2.1) (2.1) ∑ i j a i j 2 = T r ( A T A ) , ∂ T r ( A ) ∂ A = I

E(W)=12Tr((XWT)T(XWT))(2.2) (2.2) E ( W ) = 1 2 T r ( ( X W − T ) T ( X W − T ) )

E(W)W=12E(W)(XWT)T(XWT)(XWT)T(XWT)W(2.3) (2.3) ∂ E ( W ) ∂ W = 1 2 ∂ E ( W ) ∂ ( X W − T ) T ( X W − T ) ∂ ( X W − T ) T ( X W − T ) ∂ W

=XT(XWT)....using(4)(2.4) (2.4) = X T ( X W − T ) . . . . u s i n g ( 4 )

and for all matrix A the inverse of ATA A T A must exist.
E(W)W=0W=(XTX)1XTT ∂ E ( W ) ∂ W = 0 ⇒ W = ( X T X ) − 1 X T T

Thus the closed form solution for y(xn) y ( x n ) is :
Y(xn)=WTxn=((XTX)1XTT)Txn Y ( x n ) = W T x n = ( ( X T X ) − 1 X T T ) T x n

3. Problems

  1. Least-squares is very sensitive to outliers!
  2. Least-squares corresponds to Maximum Likelihood under the assumption of a Gaussian conditional distribution.However, our binary target vectors have a distribution that is clearly non-Gaussian (0-1 distribution when K is 2)!

    discuss later

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值