Machine Learning(3)Least-squares classification
Chenjing Ding
2018/02/28
notation | meaning |
---|---|
M | the number of mixture components |
x_n | n-th input vector |
N | the number of training input vectors |
K | the number of classes |
w | a vector of the weight matrix |
W | weight matrix |
X | input metrix |
To put it clearly, all vectors in this passage are column vector, the transpose of them are row vector; and all Capital letter represents matrix, otherwise it represents a vector.
1.General Classification Problem
1.1 one sample input case
Let’s consider K discriminant linear models:
yk(x)=wTkx+wk0,k=1...K(1.1.1)
(1.1.1)
y
k
(
x
)
=
w
k
T
x
+
w
k
0
,
k
=
1...
K
Both
wk
w
k
and
x
x
are vector. if W is a matrix as followed:
then we obtain Y(x) Y ( x ) which is a column vector,
Y(x)=WTx=[y1(x) y2(x) ...yK(x)]T(1.1.3)
(1.1.3)
Y
(
x
)
=
W
T
x
=
[
y
1
(
x
)
y
2
(
x
)
.
.
.
y
K
(
x
)
]
T
1.2 input as a matrix
For entire data set, X is a matrix.
Yˆ(X)=XW
Y
^
(
X
)
=
X
W
X=[x1 x2 ... xN]T
X
=
[
x
1
x
2
.
.
.
x
N
]
T
T=[t1 t2...tN]T,Yˆ(X)=[Y(x1) Y(x2) ...Y(xN)]T(1.2.1)
(1.2.1)
T
=
[
t
1
t
2...
t
N
]
T
,
Y
^
(
X
)
=
[
Y
(
x
1
)
Y
(
x
2
)
.
.
.
Y
(
x
N
)
]
T
and t1,t2... t 1 , t 2 . . . is column vectors , T and Yˆ(X) Y ^ ( X ) are matrix, T is the target matrix ;
2. Closed-form solution
Try to find the closed-form solution of W, directly to minimize the sum-of-squares error:
E(W)=∑n=1N∑k=1K(yk(xn)−tnk)2=∑n=1N∑k=1K(wTkxn−tnk)2
E
(
W
)
=
∑
n
=
1
N
∑
k
=
1
K
(
y
k
(
x
n
)
−
t
n
k
)
2
=
∑
n
=
1
N
∑
k
=
1
K
(
w
k
T
x
n
−
t
n
k
)
2
Let’s formulate the sum-of-squares error in matrix notation:
∑ija2ij=Tr(ATA),∂Tr(A)∂A=I(2.1)
(2.1)
∑
i
j
a
i
j
2
=
T
r
(
A
T
A
)
,
∂
T
r
(
A
)
∂
A
=
I
E(W)=12Tr((XW−T)T(XW−T))(2.2)
(2.2)
E
(
W
)
=
1
2
T
r
(
(
X
W
−
T
)
T
(
X
W
−
T
)
)
∂E(W)∂W=12∂E(W)∂(XW−T)T(XW−T)∂(XW−T)T(XW−T)∂W(2.3)
(2.3)
∂
E
(
W
)
∂
W
=
1
2
∂
E
(
W
)
∂
(
X
W
−
T
)
T
(
X
W
−
T
)
∂
(
X
W
−
T
)
T
(
X
W
−
T
)
∂
W
=XT(XW−T)....using(4)(2.4)
(2.4)
=
X
T
(
X
W
−
T
)
.
.
.
.
u
s
i
n
g
(
4
)
and for all matrix A the inverse of ATA A T A must exist.
∂E(W)∂W=0⇒W=(XTX)−1XTT
∂
E
(
W
)
∂
W
=
0
⇒
W
=
(
X
T
X
)
−
1
X
T
T
Thus the closed form solution for y(xn) y ( x n ) is :
Y(xn)=WTxn=((XTX)−1XTT)Txn
Y
(
x
n
)
=
W
T
x
n
=
(
(
X
T
X
)
−
1
X
T
T
)
T
x
n
3. Problems
- Least-squares is very sensitive to outliers!
Least-squares corresponds to Maximum Likelihood under the assumption of a Gaussian conditional distribution.However, our binary target vectors have a distribution that is clearly non-Gaussian (0-1 distribution when K is 2)!
discuss later