Zhankun Luo
PUID: 0031195279
Email: luo333@pnw.edu
Fall-2018-ECE-59500-009
Instructor: Toma Hentea
Homework 5
Function
plot_point.m
function plot_point(X,y)
%this function can handle up to 6 different classes
[l,N]=size(X); %N=no. of data vectors, l=dimensionality
if(l~=2)
fprintf('NO PLOT CAN BE GENERATED\n')
return
else
pale=['ro';'g+';'b.';'y.';'m.';'c.'];
%Plot of the data vectors
hold on
for i=1:N
plot(X(1,i),X(2,i),pale(y(i),:))
end
hold off
end
Calc_SwSbSm.m
function [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y )
% [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y )
% Calculate S_w, S_b, S_m
% OUTPUT:
% S_w: the within-class
% S_b: the between-class
% S_m: the mixture Sm = Sw + Sb
c = max(y); % number of classes
[l, N] = size(X); % N: number of vectors, l: dimensions
mu = zeros(l, c);
S_w = zeros(l, l); S_b = zeros(l, l); mu_0 = zeros(l, 1);
P = zeros(1, c);
for i = 1:c
index_class_i = find(y == i);
Mu = sum(X(:, index_class_i), 2) / length(index_class_i);
mu(:, i) = Mu; mu_0 = mu_0 + sum(X(:, index_class_i), 2) / N;
P(i) = length(index_class_i) / N;
X_relative = X(:, index_class_i) - repmat(Mu, 1, length(index_class_i));
S_wi = zeros(l, l);
for j = 1:length(index_class_i)
S_wi = S_wi + X_relative(:, j) * X_relative(:, j)';
end
S_w = S_w + S_wi / N;
end
for i = 1:c
S_b = S_b + P(i) * (mu(:, i) - mu_0) * (mu(:, i) - mu_0)';
end
S_m = S_w + S_b;
J3.m
function J3 = J3(S_w, S_m)
J3 = trace(S_w \ S_m);
end
FDR.m
function [ FDR, w ] = FDR( X, y , D_y)
%function [ FDR, w ] = FDR( X, y , D_y)
% Fisher's Discriminant Ratio
% INPUT:
% X: points
% y: y==i ==> belong to Class i
% D_y: dimension of w, how many features Z_i = w_i'* X need to classify
% OUTPUT:
% FDR: trace((w * S_w * w') \ (w * S_b * w'))
% w: use w ==> make Z = w'* X => calculate tr(S_w \ S_b) of Z
% <=> maximize FDR of X
[ S_w, S_b, S_m ] = Calc_SwSbSm( X, y );
[ Vector, Diag ] = eig( S_w \ S_b );
vector = fliplr(Vector); % make highest eig show first
w = vector(:, 1:D_y); % select D_y vectors corresponding to D_y highest eig values
FDR = trace((w'* S_w * w) \ (w'* S_b * w));
end
Problem
Problem 5.1
Both classes,
ω
1
\omega_1
ω1 ,
ω
2
\omega_2
ω2 , are described by Gaussian distributions of
the same covariance matrix
Σ
=
I
\Sigma = I
Σ=I , where
I
I
I is the identity matrix and mean values
μ
\mu
μ and
−
μ
- \mu
−μ, respectively, where:
μ
=
[
μ
1
,
.
.
.
,
μ
l
]
T
=
[
1
,
1
2
,
.
.
.
,
1
l
]
T
b
l
=
∥
μ
∥
=
∑
i
=
1
l
μ
i
2
\mu = [\mu_1, ..., \mu_l]^T = [1, \frac{1}{\sqrt{2}}, ... , \frac{1}{\sqrt{l}}]^T\\ b_l = \lVert \mu \rVert = \sqrt{\sum_{i=1}^{l}{\mu_i^2}}
μ=[μ1,...,μl]T=[1,21,...,l1]Tbl=∥μ∥=i=1∑lμi2
Define:
z
≡
x
T
μ
,
w
h
e
r
e
x
=
[
x
1
,
.
.
.
,
x
l
]
T
z \equiv x^T\mu, where \ x = [x_1, ..., x_l]^T
z≡xTμ,where x=[x1,...,xl]T
Proof of E [ z ] = ∥ μ ∥ 2 E[z] = \lVert \mu \rVert^2 E[z]=∥μ∥2
F o r x ∈ ω 1 : E [ z ] = E [ x T μ ] = E [ x T ] μ = μ T μ = ∥ μ ∥ 2 F o r x ∈ ω 2 : E [ z ] = E [ x T μ ] = E [ x T ] μ = − μ T μ = − ∥ μ ∥ 2 For \ x \in \omega_1: E[z] = E[x^T\mu] = E[x^T]\mu = \mu^T\mu = \lVert \mu \rVert^2\\ For \ x \in \omega_2: E[z] = E[x^T\mu] = E[x^T]\mu = - \mu^T\mu = - \lVert \mu \rVert^2 For x∈ω1:E[z]=E[xTμ]=E[xT]μ=μTμ=∥μ∥2For x∈ω2:E[z]=E[xTμ]=E[xT]μ=−μTμ=−∥μ∥2
Proof of σ z 2 = ∥ μ ∥ 2 l \sigma_{z}^2 = \lVert \mu \rVert^2 l σz2=∥μ∥2l
F
o
r
x
∈
ω
1
:
For \ x \in \omega_1:
For x∈ω1:
σ
z
2
=
E
[
(
x
−
μ
)
2
∥
μ
∥
2
]
=
∥
μ
∥
2
∭
−
∞
∞
(
x
−
μ
)
2
1
(
2
π
)
l
2
Σ
1
2
e
x
p
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
d
x
=
∥
μ
∥
2
∭
−
∞
∞
∑
i
=
1
l
(
x
i
−
μ
i
)
2
1
(
2
π
)
l
2
e
x
p
(
−
1
2
∑
j
=
1
l
(
x
j
−
μ
j
)
2
)
d
x
=
∥
μ
∥
2
∑
i
=
1
l
∭
−
∞
∞
(
x
i
−
μ
i
)
2
1
(
2
π
)
l
2
e
x
p
(
−
1
2
∑
j
=
1
l
(
x
j
−
μ
j
)
2
)
d
x
1
.
.
.
d
x
l
=
∥
μ
∥
2
∑
i
=
1
l
∫
−
∞
∞
(
x
i
−
μ
i
)
2
1
(
2
π
)
1
2
e
x
p
(
−
1
2
(
x
i
−
μ
i
)
2
)
d
x
i
∏
j
≠
i
,
j
=
1
l
∫
−
∞
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
(
x
j
−
μ
j
)
2
)
d
x
j
=
∥
μ
∥
2
∑
i
=
1
l
1
⋅
1
(
l
−
1
)
=
∥
μ
∥
2
l
\sigma_{z}^2 = E[(x-\mu)^2\lVert \mu \rVert^2]\\ = \lVert \mu \rVert^2\iiint_{-\infty}^{\infty}(x-\mu)^2 \frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)) dx\\ =\lVert \mu \rVert^2 \iiint_{-\infty}^{\infty}\sum_{i=1}^{l}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \iiint_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx_1...dx_l\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \int_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{1}{2}}} exp(-\frac{1}{2}(x_i-\mu_i)^2) dx_i \prod_{j\neq i, j =1}^{l}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(x_j-\mu_j)^2)dx_j\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} 1\cdot 1^{(l-1)}\\ =\lVert \mu \rVert^2 l
σz2=E[(x−μ)2∥μ∥2]=∥μ∥2∭−∞∞(x−μ)2(2π)2lΣ211exp(−21(x−μ)TΣ−1(x−μ))dx=∥μ∥2∭−∞∞i=1∑l(xi−μi)2(2π)2l1exp(−21j=1∑l(xj−μj)2)dx=∥μ∥2i=1∑l∭−∞∞(xi−μi)2(2π)2l1exp(−21j=1∑l(xj−μj)2)dx1...dxl=∥μ∥2i=1∑l∫−∞∞(xi−μi)2(2π)211exp(−21(xi−μi)2)dxij̸=i,j=1∏l∫−∞+∞(2π)211exp(−21(xj−μj)2)dxj=∥μ∥2i=1∑l1⋅1(l−1)=∥μ∥2l
F
o
r
x
∈
ω
2
:
For \ x \in \omega_2:
For x∈ω2:
σ
z
2
=
E
[
(
x
+
μ
)
2
∥
μ
∥
2
]
=
∥
μ
∥
2
l
\sigma_{z}^2 = E[(x+\mu)^2\lVert \mu \rVert^2] = \lVert \mu \rVert^2 l
σz2=E[(x+μ)2∥μ∥2]=∥μ∥2l
Proof of the probability of error P e P_e Pe
Gram–Schmidt process
Set:
α
1
=
μ
,
α
2
=
[
0
,
1
,
0
,
.
.
.
,
0
]
T
,
α
3
=
[
0
,
0
,
1
,
0
,
.
.
.
,
0
]
T
,
α
l
=
[
0
,
.
.
.
,
0
,
1
]
T
T
h
e
s
e
v
e
c
t
o
r
s
a
r
e
L
i
n
e
a
r
l
y
I
n
d
e
p
e
n
d
e
n
t
\alpha_1 = \mu,\alpha_2 = [0, 1, 0, ..., 0]^T, \alpha_3 = [0, 0, 1, 0, ..., 0]^T, \alpha_l = [0,..., 0, 1]^T\\ These \ vectors \ are \ Linearly \ Independent
α1=μ,α2=[0,1,0,...,0]T,α3=[0,0,1,0,...,0]T,αl=[0,...,0,1]TThese vectors are Linearly Independent
Then:
β
1
=
α
1
=
μ
β
2
=
α
1
−
<
α
2
,
β
1
>
<
β
1
,
β
1
>
β
1
.
.
.
β
i
=
α
i
−
<
α
i
,
β
1
>
<
β
1
,
β
1
>
β
1
−
.
.
.
−
<
α
i
,
β
i
−
1
>
<
β
i
−
1
,
β
i
−
1
>
β
i
−
1
(
i
=
2
,
.
.
.
l
)
\beta_1 = \alpha_1 = \mu\\ \beta_2 = \alpha_1- \frac{<\alpha_2, \beta_1>}{<\beta_1, \beta_1>}\beta_1\\ ...\\ \beta_i = \alpha_i- \frac{<\alpha_i, \beta_1>}{<\beta_1, \beta_1>}\beta_1-...-\frac{<\alpha_i, \beta_{i-1}>}{<\beta_{i-1}, \beta_{i-1}>}\beta_{i-1} \ \ (i =2,...l)
β1=α1=μβ2=α1−<β1,β1><α2,β1>β1...βi=αi−<β1,β1><αi,β1>β1−...−<βi−1,βi−1><αi,βi−1>βi−1 (i=2,...l)
Normalize:
e
1
=
β
1
∥
μ
∥
=
μ
b
l
e
i
=
β
i
∥
β
i
∥
(
i
=
1
,
.
.
.
,
l
)
e_1 = \frac{\beta_1}{\lVert \mu \rVert}= \frac{\mu}{b_l}\\ e_i = \frac{\beta_i}{\lVert \beta_i \rVert}\ \ (i =1,...,l)
e1=∥μ∥β1=blμei=∥βi∥βi (i=1,...,l)
Preparing work
Set:
P
=
[
e
1
,
.
.
.
,
e
l
]
T
<
e
i
,
e
j
>
=
{
0
i
f
i
≠
j
1
i
f
i
=
j
P
P
T
=
[
e
1
,
.
.
.
,
e
l
]
T
[
e
1
,
.
.
.
,
e
l
]
=
I
l
P
i
s
a
n
o
r
t
h
o
g
o
n
a
l
m
a
t
r
i
x
,
P
−
1
=
P
T
,
P
T
P
=
I
l
,
∣
d
e
t
(
P
)
∣
=
1
P = [e_1, ..., e_l]^T\\ <e_i, e_j> = \left\{ \begin{array}{lcl} 0\quad &if \quad i\neq j \\ 1\quad &if \quad i=j \end{array} \right. \\ PP^T= [e_1, ..., e_l]^T [e_1, ..., e_l] = I_l\\ P \ is \ an \ orthogonal \ matrix, P^{-1}=P^T, P^TP=I_l, |det(P)| =1 \\
P=[e1,...,el]T<ei,ej>={01ifi̸=jifi=jPPT=[e1,...,el]T[e1,...,el]=IlP is an orthogonal matrix,P−1=PT,PTP=Il,∣det(P)∣=1
Apply P to
(
x
−
μ
)
(x-\mu)
(x−μ)
P
(
x
−
μ
)
=
[
e
1
,
e
2
,
.
.
.
,
e
l
]
T
(
x
−
μ
)
=
[
e
1
T
x
,
.
.
.
,
e
l
T
x
]
T
−
[
b
l
,
0
,
.
.
.
,
0
]
T
b
e
c
a
u
s
e
,
e
i
T
μ
=
e
i
T
(
b
l
e
1
)
=
{
0
i
f
i
≠
1
b
l
i
f
i
=
1
d
e
f
i
n
e
y
=
[
y
1
,
.
.
.
,
y
l
]
T
≡
[
e
1
T
x
,
.
.
.
,
e
l
T
x
]
T
t
h
u
s
P
(
x
−
μ
)
=
[
y
1
−
b
l
,
y
2
,
.
.
.
,
y
l
]
T
P(x-\mu) = [e_1, e_2, ..., e_l]^T (x-\mu) = [e_1^Tx, ..., e_l^Tx]^T - [b_l, 0, ..., 0]^T\\ because, \quad e_i^T\mu = e_i^T(b_l e_1)= \left\{ \begin{array}{lcl} 0\quad &if \quad i\neq 1 \\ b_l\quad &if \quad i=1 \end{array} \right. \\ define \quad y = [y_1, ..., y_l]^T \equiv [e_1^Tx, ..., e_l^Tx]^T\\ thus \quad P(x-\mu) = [y_1 - b_l, y_2, ..., y_l]^T
P(x−μ)=[e1,e2,...,el]T(x−μ)=[e1Tx,...,elTx]T−[bl,0,...,0]Tbecause,eiTμ=eiT(ble1)={0blifi̸=1ifi=1definey=[y1,...,yl]T≡[e1Tx,...,elTx]TthusP(x−μ)=[y1−bl,y2,...,yl]T
Calculate Probability
The probability density function for ( x 1 , . . . , x l ) (x_1, ..., x_l) (x1,...,xl) is given by, ( Σ = I l \Sigma=I_l Σ=Il):
class
ω
1
\omega_1
ω1:
p
(
x
;
μ
,
Σ
)
=
1
(
2
π
)
l
2
Σ
1
2
e
x
p
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
(
x
−
μ
)
T
(
x
−
μ
)
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
(
x
−
μ
)
T
P
T
P
(
x
−
μ
)
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
[
y
1
−
b
l
,
y
2
,
.
.
.
,
y
l
]
[
y
1
−
b
l
,
y
2
,
.
.
.
,
y
l
]
T
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
[
(
y
1
−
b
l
)
2
+
∑
i
=
2
l
y
i
2
]
)
p(x; \mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^T (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^TP^TP (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[y_1 - b_l, y_2, ..., y_l][y_1 - b_l, y_2, ..., y_l]^T)\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])\\
p(x;μ,Σ)=(2π)2lΣ211exp(−21(x−μ)TΣ−1(x−μ))=(2π)2l1exp(−21(x−μ)T(x−μ))=(2π)2l1exp(−21(x−μ)TPTP(x−μ))=(2π)2l1exp(−21[y1−bl,y2,...,yl][y1−bl,y2,...,yl]T)=(2π)2l1exp(−21[(y1−bl)2+i=2∑lyi2])
class
ω
2
\omega_2
ω2:
p
(
x
;
−
μ
,
Σ
)
=
1
(
2
π
)
l
2
Σ
1
2
e
x
p
(
−
1
2
(
x
+
μ
)
T
Σ
−
1
(
x
+
μ
)
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
(
x
+
μ
)
T
(
x
+
μ
)
)
=
1
(
2
π
)
l
2
e
x
p
(
−
1
2
[
(
y
1
+
b
l
)
2
+
∑
i
=
2
l
y
i
2
]
)
p(x; -\mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x+\mu)^T \Sigma^{-1}(x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x+\mu)^T (x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])\\
p(x;−μ,Σ)=(2π)2lΣ211exp(−21(x+μ)TΣ−1(x+μ))=(2π)2l1exp(−21(x+μ)T(x+μ))=(2π)2l1exp(−21[(y1+bl)2+i=2∑lyi2])
Because:
d
y
=
P
d
x
P
−
1
d
y
=
P
T
d
y
=
d
x
∣
d
e
t
(
P
T
)
∣
d
y
1
.
.
.
d
y
l
=
d
y
1
.
.
.
d
y
l
=
d
x
1
.
.
.
d
x
l
dy = Pdx\\ P^{-1}dy = P^Tdy =dx\\ |det(P^T)|dy_1...dy_l = dy_1...dy_l = dx_1...dx_l\\
dy=PdxP−1dy=PTdy=dx∣det(PT)∣dy1...dyl=dy1...dyl=dx1...dxl
Then:
For class
ω
1
\omega_1
ω1:
P
(
z
=
x
T
μ
<
0
∣
ω
1
)
=
P
(
z
=
x
T
μ
=
x
T
b
l
e
1
=
b
l
y
1
<
0
∣
ω
1
)
=
P
(
y
1
<
0
∣
ω
1
)
=
∭
y
1
<
0
p
(
x
;
μ
,
Σ
)
d
x
1
.
.
.
d
x
l
=
∭
y
1
<
0
1
(
2
π
)
l
2
e
x
p
(
−
1
2
[
(
y
1
−
b
l
)
2
+
∑
i
=
2
l
y
i
2
]
)
d
y
1
.
.
.
d
y
l
=
∫
−
∞
0
1
(
2
π
)
1
2
e
x
p
(
−
1
2
(
y
1
−
b
l
)
2
)
d
y
1
∏
i
=
2
l
∫
−
∞
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
y
i
2
)
d
y
i
=
∫
−
∞
0
1
(
2
π
)
1
2
e
x
p
(
−
1
2
(
y
1
−
b
l
)
2
)
d
y
1
=
∫
−
∞
−
b
l
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
=
∫
b
l
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
P(z=x^T\mu<0|\omega_1) \\ = P(z=x^T\mu=x^Tb_l e_1=b_ly_1<0|\omega_1) =P(y_1<0|\omega_1)\\ = \iiint_{y_1<0}p(x; \mu, \Sigma)dx_1...dx_l\\ = \iiint_{y_1<0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1 \prod^{l}_{i=2}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}y_i^2)dy_i\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1\\ = \int_{-\infty}^{-b_l}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\
P(z=xTμ<0∣ω1)=P(z=xTμ=xTble1=bly1<0∣ω1)=P(y1<0∣ω1)=∭y1<0p(x;μ,Σ)dx1...dxl=∭y1<0(2π)2l1exp(−21[(y1−bl)2+i=2∑lyi2])dy1...dyl=∫−∞0(2π)211exp(−21(y1−bl)2)dy1i=2∏l∫−∞+∞(2π)211exp(−21yi2)dyi=∫−∞0(2π)211exp(−21(y1−bl)2)dy1=∫−∞−bl(2π)211exp(−21Z2)dZ=∫bl+∞(2π)211exp(−21Z2)dZ
For class
ω
2
\omega_2
ω2:
P
(
z
=
x
T
μ
>
0
∣
ω
2
)
=
P
(
y
1
>
0
∣
ω
2
)
=
∭
y
1
>
0
1
(
2
π
)
l
2
e
x
p
(
−
1
2
[
(
y
1
+
b
l
)
2
+
∑
i
=
2
l
y
i
2
]
)
d
y
1
.
.
.
d
y
l
=
∫
0
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
(
y
1
+
b
l
)
2
)
d
y
1
=
∫
b
l
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
P(z=x^T\mu>0|\omega_2)\\ = P(y_1>0|\omega_2)\\ = \iiint_{y_1>0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{0}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 + b_l)^2)dy_1\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\
P(z=xTμ>0∣ω2)=P(y1>0∣ω2)=∭y1>0(2π)2l1exp(−21[(y1+bl)2+i=2∑lyi2])dy1...dyl=∫0+∞(2π)211exp(−21(y1+bl)2)dy1=∫bl+∞(2π)211exp(−21Z2)dZ
Calculate P e P_e Pe
We have
P
(
ω
1
)
=
1
2
,
P
(
ω
2
)
=
1
2
P(\omega_1) = \frac{1}{2}, P(\omega_2)=\frac{1}{2}
P(ω1)=21,P(ω2)=21:
P
e
=
P
(
ω
1
)
P
(
z
=
x
T
μ
<
0
∣
ω
1
)
+
P
(
ω
2
)
P
(
z
=
x
T
μ
>
0
∣
ω
2
)
=
1
2
∫
b
l
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
+
1
2
∫
b
l
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
=
∫
b
l
+
∞
1
(
2
π
)
1
2
e
x
p
(
−
1
2
Z
2
)
d
Z
P_e = P(\omega_1)P(z=x^T\mu<0|\omega_1) + P(\omega_2)P(z=x^T\mu>0|\omega_2)\\ = \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ + \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ
Pe=P(ω1)P(z=xTμ<0∣ω1)+P(ω2)P(z=xTμ>0∣ω2)=21∫bl+∞(2π)211exp(−21Z2)dZ+21∫bl+∞(2π)211exp(−21Z2)dZ=∫bl+∞(2π)211exp(−21Z2)dZ
Where
b
l
=
∥
μ
∥
b_l = \lVert \mu \rVert
bl=∥μ∥
Computer Experiment
Experiment 5.2
meaning of S_w, S_b, S_m, J3
number of classes: M M M
number of training samples in class ω i \omega_i ωi: n i n_i ni
number of features: N N N
dimension of features: l l l
Within Class:
S
w
=
∑
i
=
1
M
P
i
S
w
i
w
h
e
r
e
S
w
i
=
E
[
(
x
−
μ
i
)
(
x
−
μ
i
)
T
]
,
P
i
=
n
i
N
S_w = \sum_{i=1}^{M}P_i S_{wi}\\ where \quad S_{wi} = E[(x-\mu_i)(x-\mu_i)^T], \quad P_i = \frac{n_i}{N}
Sw=i=1∑MPiSwiwhereSwi=E[(x−μi)(x−μi)T],Pi=Nni
Between Class:
S
b
=
∑
i
=
1
M
P
i
(
μ
i
−
μ
0
)
(
μ
i
−
μ
0
)
T
w
h
e
r
e
μ
0
=
∑
i
=
1
M
P
i
μ
i
S_b = \sum_{i =1}^{M} P_i (\mu_i - \mu_0)(\mu_i - \mu_0)^T\\ where \quad \mu_0 = \sum_{i=1}^{M} P_i \mu_i
Sb=i=1∑MPi(μi−μ0)(μi−μ0)Twhereμ0=i=1∑MPiμi
Mixture Scatter:
S
m
=
E
[
(
x
−
μ
0
)
(
x
−
μ
0
)
T
]
=
S
w
+
S
b
S_m = E[(x-\mu_0)(x-\mu_0)^T] = S_w + S_b
Sm=E[(x−μ0)(x−μ0)T]=Sw+Sb
Criteria
J
3
J_3
J3:
J
3
=
t
r
a
c
e
(
S
w
−
1
S
m
)
J_3 = trace(S_w^{-1}S_m)
J3=trace(Sw−1Sm)
code: experiment5_2.m
%% Computer Experiment 5.2
close('all'); clear; clc;
m = [-10 -10 10 10;
-10 10 -10 10];
m_test = [-1 -1 1 1;
-1 1 -1 1];
s1 = 0.2 * [1 0;0 1]; s2 = 3 * [1 0;0 1];
P = [0.25 0.25 0.25 0.25]';
N = 400; % 100 points each class, 4 class
%% the Generated 400 Vectors in X1, X1_test, X2, X2_test
randn('seed', 0); % reproducible
for i = 1:size(m, 2)
if i == 1
X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';
X1_test = mvnrnd(m_test(:, i), s1, fix(P(i)*N))';
X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';
X2_test = mvnrnd(m_test(:, i), s2, fix(P(i)*N))';
else
X1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];
X1_test = [X1_test, mvnrnd(m_test(:, i), s1, fix(P(1)*N))'];
X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];
X2_test = [X2_test, mvnrnd(m_test(:, i), s2, fix(P(1)*N))'];
end
end
y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N)), 3 * ones(1, fix((P(3)*N))), 4 * ones(1, fix((P(4)*N)))];
y1_test = y1;
y2 = y1; y2_test = y2;
%% Plot all Situations for 4 Classes
figure; plot_point(X1, y1);
title('Classes are Far away, $${\Sigma = 0.2 I}$$','Interpreter','latex');
figure; plot_point(X1_test, y2_test);
title('Classes are Close, $${\Sigma = 0.2 I}$$','Interpreter','latex');
figure; plot_point(X2, y2);
title('Classes are Far away, $${\Sigma = 3 I}$$','Interpreter','latex');
figure; plot_point(X2_test, y2_test);
title('Classes are Close, $${\Sigma = 3 I}$$','Interpreter','latex');
%% Calculate S_w, S_b, S_m, J3 = trace(S_w \ S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1_test, y1_test )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2_test, y2_test )
J_3 = J3(S_w, S_m)
result
% m = [-10 -10 10 10;
% -10 10 -10 10];
% Sigma = 0.2 I
S_w =
0.2070 0.0046
0.0046 0.2145
S_b =
99.8278 -0.2653
-0.2653 100.0591
S_m =
100.0348 -0.2607
-0.2607 100.2736
J_3 = 951.3471
% m_test = [-1 -1 1 1;
% -1 1 -1 1];
% Sigma = 0.2 I
S_w =
0.2042 0.0035
0.0035 0.1999
S_b =
1.0225 -0.0537
-0.0537 0.9842
S_m =
1.2266 -0.0502
-0.0502 1.1841
J_3 = 11.9440
% m = [-10 -10 10 10;
% -10 10 -10 10];
% Sigma = 3 I
S_w =
3.0501 -0.0759
-0.0759 3.1290
S_b =
99.7179 -0.4153
-0.4153 100.9610
S_m =
102.7680 -0.4912
-0.4912 104.0900
J_3 = 66.9920
% m_test = [-1 -1 1 1;
% -1 1 -1 1];
% Sigma = 3 I
S_w =
2.8682 -0.0492
-0.0492 2.9545
S_b =
1.1316 0.0025
0.0025 1.1283
S_m =
3.9997 -0.0466
-0.0466 4.0828
J_3 = 2.7766
conclusion
-
When Σ \Sigma Σ of classes are the same:
S w ≈ Σ S_w \approx \Sigma\\ Sw≈Σ -
When classes are Far away ⇒ \Rightarrow ⇒ t r a c e ( S b ) trace(S_b) trace(Sb) is big.
t r a c e ( S b ) = 1 N ∑ i = 1 M n i ( μ i − μ 0 ) 2 , w h e r e μ 0 = 1 N ∑ i = 1 M n i μ i trace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2, \quad where \quad \mu_0 = \frac{1}{N} \sum_{i=1}^{M} n_i \mu_i trace(Sb)=N1i=1∑Mni(μi−μ0)2,whereμ0=N1i=1∑Mniμi -
For J 3 = t r a c e ( S w − 1 S m ) J3 = trace(S_w^{-1} S_m) J3=trace(Sw−1Sm):
-
Relationship of σ , t r a c e ( s b ) \sigma, trace(s_b) σ,trace(sb)and J 3 J3 J3:
⇒ Σ = σ I a l s o S w ≈ Σ = σ I J 3 = t r a c e ( S w − 1 S m ) ≈ t r a c e ( 1 σ I S m ) = t r a c e ( S m ) σ = t r a c e ( S w + S b ) σ = t r a c e ( S w ) + t r a c e ( S b ) σ ≈ σ l + t r a c e ( S b ) σ = l + t r a c e ( S b ) σ \Rightarrow \Sigma = \sigma I\\ also \quad S_w \approx \Sigma = \sigma I\\ J3 = trace(S_w^{-1}S_m) \approx trace(\frac{1}{\sigma}I S_m) = \frac{trace(S_m)}{\sigma}\\ = \frac{trace(S_w + S_b)}{\sigma} = \frac{trace(S_w) + trace(S_b)}{\sigma} \approx \frac{\sigma l + trace(S_b)}{\sigma} = l + \frac{trace(S_b)}{\sigma} ⇒Σ=σIalsoSw≈Σ=σIJ3=trace(Sw−1Sm)≈trace(σ1ISm)=σtrace(Sm)=σtrace(Sw+Sb)=σtrace(Sw)+trace(Sb)≈σσl+trace(Sb)=l+σtrace(Sb)m = 10 [ − 1 − 1 1 1 ; − 1 1 − 1 1 ] , σ = 0.2 m = 10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2 m=10[−1 −1 1 1;−1 1 −1 1],σ=0.2 m = [ − 1 − 1 1 1 ; − 1 1 − 1 1 ] , σ = 0.2 m = [-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2 m=[−1 −1 1 1;−1 1 −1 1],σ=0.2 J 3 J3 J3 951.3471 11.9440 l + t r a c e ( S b ) σ l+\frac{trace(S_b)}{\sigma} l+σtrace(Sb) 1001.4 12.0335 m = 10 [ − 1 − 1 1 1 ; − 1 1 − 1 1 ] , σ = 3 m = 10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3 m=10[−1 −1 1 1;−1 1 −1 1],σ=3 m = [ − 1 − 1 1 1 ; − 1 1 − 1 1 ] , σ = 3 m = [-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3 m=[−1 −1 1 1;−1 1 −1 1],σ=3 J 3 J3 J3 66.9920 2.7766 l + t r a c e ( S b ) σ l +\frac{trace(S_b)} {\sigma} l+σtrace(Sb) 68.8930 2.7533 -
How J 3 J3 J3 changes:
Classes are Far away:
⇒ ( μ i − μ 0 ) 2 ↑ ⇒ t r a c e ( S b ) = 1 N ∑ i = 1 M n i ( μ i − μ 0 ) 2 ↑ ⇒ J 3 = t r a c e ( S w − 1 S b ) ≈ l + t r a c e ( S b ) σ ↑ \Rightarrow (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow trace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow ⇒(μi−μ0)2↑⇒trace(Sb)=N1i=1∑Mni(μi−μ0)2↑⇒J3=trace(Sw−1Sb)≈l+σtrace(Sb)↑
Features within every class are Close:
⇒ σ ↓ ⇒ J 3 = t r a c e ( S w − 1 S b ) ≈ l + t r a c e ( S b ) σ ↑ \Rightarrow \sigma \downarrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow ⇒σ↓⇒J3=trace(Sw−1Sb)≈l+σtrace(Sb)↑
-
Experiment 5.4
FDR: Fisher’s discriminant ratio
Theory of Fisher’s Linear Discriminant
Columns of w w w are corresponding to the Highest Eigen Values of S W − 1 S B S_W^{-1}S_B SW−1SB
code: experiment5_4.m
%% Computer Experiment 5.4
close('all'); clear; clc;
m = [2 2.5;
4 10];
s1 = [1 0;0 1]; s2 = 0.25 * [1 0;0 1];
P = [0.5 0.5]';
N = 200; % 100 points each class, 2 class
%% the Generated 200 Vectors in X1, X1_test, X2, X2_test
randn('seed', 0); % reproducible
for i = 1:size(m, 2)
if i == 1
X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';
X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';
else
X1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];
X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];
end
end
y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N))];
y2 = y1;
%% Get FDR value, w (line projected on) of X1, X2
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 )
[ FDR_1, w_1 ] = FDR( X1, y1 , 1) % set D_y: Dimension of w = 1
S_w \ [m(:, 1)-m(:, 2)]
ans = ans / sqrt(sum(ans.^2))
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 );
[ FDR_2, w_2 ] = FDR( X2, y2 , 1)
S_w \ [m(:, 1)-m(:, 2)]
ans = ans / sqrt(sum(ans.^2))
%% Plot all Situations for 4 Classes
f1 = figure; plot_point(X1, y1);
hold on; h1 = ezplot(@(x, y) w_1(1)*x + w_1(2)*y);
hold on; H1 = ezplot(@(x, y) w_1(2)*x - w_1(1)*y, [-2 8 0 15]);
f2 = figure; plot_point(X2, y2);
hold on; h2 = ezplot(@(x, y) w_2(1)*x + w_2(2)*y);
hold on; H2 = ezplot(@(x, y) w_2(2)*x - w_2(1)*y, [-4 6 0 12]);
figure(f1); title('$${\Sigma = I}$$','Interpreter','latex');
set(h1,'Color','r'); set(H1,'Color','g');
legend([h1 H1], 'y = wTx', 'line to be projected on', 'Location', 'Best')
figure(f2); title('$${\Sigma = 0.25 I}$$','Interpreter','latex');
set(h2,'Color','r'); set(H2,'Color','g');
legend([h2 H2], 'y = wTx', 'line to be projected on', 'Location', 'NorthWest')
result
% Sigma = I
FDR_1 = 8.1336
w_1 =
-0.1118
-0.9937
S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes
-0.6136
-5.5634
normalized S_w \ [m(:, 1)-m(:, 2)] =
-0.1096
-0.9940
% Sigma = 0.25 I
FDR_2 = 38.4050
w_2 =
0.0075
-1.0000
S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes
-0.0019
-25.9308
normalized S_w \ [m(:, 1)-m(:, 2)] =
-0.0001
-1.0000
conclusion
-
Theoretically, for 2 classes:
When J ( w ) J(w) J(w) is maximized,
D i f f e r e n t i a t i n g J ( w ) = w T S b w w T S w w H a v i n g ( w T S b w ) S w w = ( w T S b w ) S b w ⇒ w = J ( w ) ( S w − 1 S b ) w , J ( w ) = k 1 A l s o S b = ( m 1 − m 2 ) ( m 1 − m 2 ) T , S b w = ( ( m 1 − m 2 ) T w ) ( m 1 − m 2 ) , ( ( m 1 − m 2 ) T w ) = k 2 ⇒ w = k 1 k 2 S w − 1 ( m 1 − m 2 ) Differentiating \quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad w = J(w) (S_w^{-1}S_b) w, \quad J(w) = k_1\\ Also \quad S_b = (m_1 - m_2)(m_1 - m_2)^T, \quad S_b w = ((m_1- m_2)^T w) (m_1 - m_2), \quad ((m_1- m_2)^T w) = k_2\\ \Rightarrow \quad w = k_1 k_2 S_w^{-1}(m_1 - m_2) DifferentiatingJ(w)=wTSwwwTSbwHaving(wTSbw)Sww=(wTSbw)Sbw⇒w=J(w)(Sw−1Sb)w,J(w)=k1AlsoSb=(m1−m2)(m1−m2)T,Sbw=((m1−m2)Tw)(m1−m2),((m1−m2)Tw)=k2⇒w=k1k2Sw−1(m1−m2)
So w w w has the same direction with S w − 1 ( m 1 − m 2 ) S_w^{-1}(m_1 - m_2) Sw−1(m1−m2)Actually:
When Σ = I \Sigma = I Σ=I, covariance is big enough, w w w and S w − 1 ( m 1 − m 2 ) S_w^{-1}(m_1 - m_2) Sw−1(m1−m2) are in the same direction.
When Σ = 0.25 I \Sigma = 0.25 I Σ=0.25I, covariance is too small, there is big difference between directions of w w w and S w − 1 ( m 1 − m 2 ) S_w^{-1}(m_1 - m_2) Sw−1(m1−m2).
w w w S w − 1 ( m 1 − m 2 ) S_w^{-1}(m_1 - m_2) Sw−1(m1−m2) normalized S w − 1 ( m 1 − m 2 ) S_w^{-1}(m_1 - m_2) Sw−1(m1−m2) J ( w ) J(w) J(w) Σ = I \Sigma = I Σ=I -0.1118; -0.9937 -0.6136; -5.5634 -0.1096; -0.9940 8.1336 Σ = 0.25 I \Sigma = 0.25I Σ=0.25I 0.0075; -1.0000 -0.0019; -25.9308 -0.0001; -1.0000 38.4050 -
Theoretically:
H a v i n g S w ≈ Σ = σ I , R e s t r a i n t c o n d i t i o n : w T w = 1 , M a x e i g e n v a l u e o f S b : Λ m a x D i f f e r e n t i a t i n g J ( w ) = w T S b w w T S w w H a v i n g ( w T S b w ) S w w = ( w T S b w ) S b w ⇒ e i g e n v a l u e p r o b l e m : J ( w ) w = S w − 1 S b w ≈ S b σ w F o r 2 c l a s s e s : J ( w ) ≈ λ m a x − σ m a x e i g e n v a l u e o f S b σ J ( w ) ≈ λ m a x − σ = Λ m a x σ Having \quad S_w \approx \Sigma = \sigma I, \quad Restraint \ condition: w^T w = 1, \quad Max\ eigen\ value\ of\ S_b: \Lambda_{max}\\ Differentiating \quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad eigen\ value\ problem: J(w)w=S_w^{-1}S_b w \approx \frac{S_b}{\sigma} w\\ For\ 2\ classes: J(w) \approx \lambda_{max-\sigma} \quad max\ eigen\ value\ of \quad \frac{S_b}{\sigma}\\ J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} HavingSw≈Σ=σI,Restraint condition:wTw=1,Max eigen value of Sb:ΛmaxDifferentiatingJ(w)=wTSwwwTSbwHaving(wTSbw)Sww=(wTSbw)Sbw⇒eigen value problem:J(w)w=Sw−1Sbw≈σSbwFor 2 classes:J(w)≈λmax−σmax eigen value ofσSbJ(w)≈λmax−σ=σΛmax
When Distance between centers is not changed:
⇒ ( μ i − μ 0 ) ( μ i − μ 0 ) T = c o n s t m a t r i x ⇒ S b = 1 N ∑ i = 1 M n i ( μ i − μ 0 ) ( μ i − μ 0 ) T = c o n s t m a t r i x ⇒ Λ m a x = c o n s t ⇒ J ( w ) σ ≈ Λ m a x = c o n s t \Rightarrow (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow S_b =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow \Lambda_{max} = const\\ \Rightarrow J(w)\sigma \approx \Lambda_{max} = const ⇒(μi−μ0)(μi−μ0)T=const matrix⇒Sb=N1i=1∑Mni(μi−μ0)(μi−μ0)T=const matrix⇒Λmax=const⇒J(w)σ≈Λmax=const
σ \sigma σ of classes is big $\Rightarrow $ FDR: Fisher’s discriminant ratio is small.
σ ↑ ⇒ J ( w ) ≈ λ m a x − σ = Λ m a x σ ↓ \sigma \uparrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \downarrow σ↑⇒J(w)≈λmax−σ=σΛmax↓
σ \sigma σ of classes is small $\Rightarrow $ FDR: Fisher’s discriminant ratio is big.
σ ↓ ⇒ J ( w ) ≈ λ m a x − σ = Λ m a x σ ↑ \sigma \downarrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \uparrow σ↓⇒J(w)≈λmax−σ=σΛmax↑
Actually:σ \sigma σ J ( w ) J(w) J(w) σ J ( w ) \sigma J(w) σJ(w) Λ m a x \Lambda_{max} Λmax σ J ( w ) Λ m a x × 100 % \frac{\sigma J(w)}{\Lambda_{max}} \times 100\% ΛmaxσJ(w)×100% Σ = I \Sigma = I Σ=I 1 8.1336 8.1336 8.7515 92.94% Σ = 0.25 I \Sigma = 0.25I Σ=0.25I 0.25 38.4050 9.6013 8.9344 107.46%