homework5_ZhankunLuo

最新推荐文章于 2024-07-10 17:12:56 发布

dassein

最新推荐文章于 2024-07-10 17:12:56 发布

阅读量257

点赞数

本文链接：https://blog.csdn.net/dassein/article/details/83821726

版权

pattern recognition 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Zhankun Luo

PUID: 0031195279

Email: luo333@pnw.edu

Fall-2018-ECE-59500-009

Instructor: Toma Hentea

Homework 5

Function

plot_point.m

function plot_point(X,y)
%this function can handle up to 6 different classes
[l,N]=size(X); %N=no. of data vectors, l=dimensionality
if(l~=2)
    fprintf('NO PLOT CAN BE GENERATED\n')
    return
else
    pale=['ro';'g+';'b.';'y.';'m.';'c.']; 
    %Plot of the data vectors
    hold on
    for i=1:N
       plot(X(1,i),X(2,i),pale(y(i),:))
    end    
    hold off
end

Calc_SwSbSm.m

function [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y )
% [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y )
%   Calculate S_w, S_b, S_m
% OUTPUT:
%   S_w: the within-class
%   S_b: the between-class
%   S_m: the mixture Sm = Sw + Sb
c = max(y); % number of classes
[l, N] = size(X); % N: number of vectors, l: dimensions
mu = zeros(l, c);
S_w = zeros(l, l); S_b = zeros(l, l); mu_0 = zeros(l, 1);
P = zeros(1, c);
for i = 1:c
    index_class_i = find(y == i);
    Mu = sum(X(:, index_class_i), 2) / length(index_class_i);
    mu(:, i) = Mu; mu_0 = mu_0 + sum(X(:, index_class_i), 2) / N;
    P(i) = length(index_class_i) / N;
    X_relative = X(:, index_class_i) - repmat(Mu, 1, length(index_class_i));
    S_wi = zeros(l, l);
    for j = 1:length(index_class_i)
        S_wi = S_wi + X_relative(:, j) * X_relative(:, j)';
    end
    S_w = S_w + S_wi / N;    
end
for i = 1:c
    S_b = S_b + P(i) * (mu(:, i) - mu_0) * (mu(:, i) - mu_0)';
end
S_m = S_w + S_b;

J3.m

function J3 = J3(S_w, S_m)
J3 = trace(S_w \ S_m);
end

FDR.m

function [ FDR, w ] = FDR( X, y , D_y)
%function [ FDR, w ] = FDR( X, y , D_y)
% Fisher's Discriminant Ratio  
% INPUT: 
%   X: points
%   y: y==i ==> belong to Class i
%   D_y: dimension of w, how many features Z_i = w_i'* X need to classify
% OUTPUT:
%   FDR: trace((w * S_w * w') \ (w * S_b * w'))
%   w: use w ==> make Z = w'* X => calculate tr(S_w \ S_b) of Z
%      <=> maximize FDR of X
[ S_w, S_b, S_m ] = Calc_SwSbSm( X, y );
[ Vector, Diag ] = eig( S_w \ S_b );
vector = fliplr(Vector); % make highest eig show first
w = vector(:, 1:D_y); % select D_y vectors corresponding to D_y highest eig values
FDR = trace((w'* S_w * w) \ (w'* S_b * w));
end

Problem

Problem 5.1

Both classes, $\omega_1$ , $\omega_2$ , are described by Gaussian distributions of
the same covariance matrix $\Sigma = I$ , where $I$ is the identity matrix and mean values
$\mu$ and $\mu$ , respectively, where:
$\mu = [\mu_1, ..., \mu_l]^T = [1, \frac{1}{\sqrt{2}}, ... , \frac{1}{\sqrt{l}}]^T\\ b_l = \lVert \mu \rVert = \sqrt{\sum_{i=1}^{l}{\mu_i^2}}$
Define:
$\equiv x^T\mu, where \ x = [x_1, ..., x_l]^T$

Proof of $\lVert \mu \rVert^2$

$\ x \in \omega_1: E[z] = E[x^T\mu] = E[x^T]\mu = \mu^T\mu = \lVert \mu \rVert^2\\ For \ x \in \omega_2: E[z] = E[x^T\mu] = E[x^T]\mu = - \mu^T\mu = - \lVert \mu \rVert^2$

Proof of $\sigma_{z}^2 = \lVert \mu \rVert^2 l$

$\ x \in \omega_1:$
$\sigma_{z}^2 = E[(x-\mu)^2\lVert \mu \rVert^2]\\ = \lVert \mu \rVert^2\iiint_{-\infty}^{\infty}(x-\mu)^2 \frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)) dx\\ =\lVert \mu \rVert^2 \iiint_{-\infty}^{\infty}\sum_{i=1}^{l}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \iiint_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx_1...dx_l\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \int_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{1}{2}}} exp(-\frac{1}{2}(x_i-\mu_i)^2) dx_i \prod_{j\neq i, j =1}^{l}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(x_j-\mu_j)^2)dx_j\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} 1\cdot 1^{(l-1)}\\ =\lVert \mu \rVert^2 l$
$\ x \in \omega_2:$
$\sigma_{z}^2 = E[(x+\mu)^2\lVert \mu \rVert^2] = \lVert \mu \rVert^2 l$

Proof of the probability of error $P_e$

Gram–Schmidt process

Set:
$\alpha_1 = \mu,\alpha_2 = [0, 1, 0, ..., 0]^T, \alpha_3 = [0, 0, 1, 0, ..., 0]^T, \alpha_l = [0,..., 0, 1]^T\\ These \ vectors \ are \ Linearly \ Independent$
Then:
$\beta_1 = \alpha_1 = \mu\\ \beta_2 = \alpha_1- \frac{<\alpha_2, \beta_1>}{<\beta_1, \beta_1>}\beta_1\\ ...\\ \beta_i = \alpha_i- \frac{<\alpha_i, \beta_1>}{<\beta_1, \beta_1>}\beta_1-...-\frac{<\alpha_i, \beta_{i-1}>}{<\beta_{i-1}, \beta_{i-1}>}\beta_{i-1} \ \ (i =2,...l)$
Normalize:
$e_1 = \frac{\beta_1}{\lVert \mu \rVert}= \frac{\mu}{b_l}\\ e_i = \frac{\beta_i}{\lVert \beta_i \rVert}\ \ (i =1,...,l)$

Preparing work

Set:
$[e_1, ..., e_l]^T\\ <e_i, e_j> = \left\{ \begin{array}{lcl} 0\quad &if \quad i\neq j \\ 1\quad &if \quad i=j \end{array} \right. \\ PP^T= [e_1, ..., e_l]^T [e_1, ..., e_l] = I_l\\ P \ is \ an \ orthogonal \ matrix, P^{-1}=P^T, P^TP=I_l, |det(P)| =1 \\$
Apply P to $(x-\mu)$
$P(x-\mu) = [e_1, e_2, ..., e_l]^T (x-\mu) = [e_1^Tx, ..., e_l^Tx]^T - [b_l, 0, ..., 0]^T\\ because, \quad e_i^T\mu = e_i^T(b_l e_1)= \left\{ \begin{array}{lcl} 0\quad &if \quad i\neq 1 \\ b_l\quad &if \quad i=1 \end{array} \right. \\ define \quad y = [y_1, ..., y_l]^T \equiv [e_1^Tx, ..., e_l^Tx]^T\\ thus \quad P(x-\mu) = [y_1 - b_l, y_2, ..., y_l]^T$

Calculate Probability

The probability density function for $x_1, ..., x_l)$ is given by, ( $\Sigma=I_l$ ):

class $\omega_1$ :
$\mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^T (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^TP^TP (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[y_1 - b_l, y_2, ..., y_l][y_1 - b_l, y_2, ..., y_l]^T)\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])\\$
class $\omega_2$ :
$-\mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x+\mu)^T \Sigma^{-1}(x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x+\mu)^T (x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])\\$
Because:
$dy = Pdx\\ P^{-1}dy = P^Tdy =dx\\ |det(P^T)|dy_1...dy_l = dy_1...dy_l = dx_1...dx_l\\$
Then:

For class $\omega_1$ :
$P(z=x^T\mu<0|\omega_1) \\ = P(z=x^T\mu=x^Tb_l e_1=b_ly_1<0|\omega_1) =P(y_1<0|\omega_1)\\ = \iiint_{y_1<0}p(x; \mu, \Sigma)dx_1...dx_l\\ = \iiint_{y_1<0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1 \prod^{l}_{i=2}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}y_i^2)dy_i\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1\\ = \int_{-\infty}^{-b_l}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\$
For class $\omega_2$ :
$P(z=x^T\mu>0|\omega_2)\\ = P(y_1>0|\omega_2)\\ = \iiint_{y_1>0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{0}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 + b_l)^2)dy_1\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\$

Calculate $P_e$

We have $P(\omega_1) = \frac{1}{2}, P(\omega_2)=\frac{1}{2}$ :
$P_e = P(\omega_1)P(z=x^T\mu<0|\omega_1) + P(\omega_2)P(z=x^T\mu>0|\omega_2)\\ = \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ + \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ$
Where $b_l = \lVert \mu \rVert$

Computer Experiment

Experiment 5.2

meaning of S_w, S_b, S_m, J3

number of classes: $M$

number of training samples in class $\omega_i$ : $n_i$

number of features: $N$

dimension of features: $l$

Within Class:
$S_w = \sum_{i=1}^{M}P_i S_{wi}\\ where \quad S_{wi} = E[(x-\mu_i)(x-\mu_i)^T], \quad P_i = \frac{n_i}{N}$
Between Class:
$S_b = \sum_{i =1}^{M} P_i (\mu_i - \mu_0)(\mu_i - \mu_0)^T\\ where \quad \mu_0 = \sum_{i=1}^{M} P_i \mu_i$
Mixture Scatter:
$S_m = E[(x-\mu_0)(x-\mu_0)^T] = S_w + S_b$
Criteria $J_3$ :
$J_3 = trace(S_w^{-1}S_m)$

code: experiment5_2.m

%% Computer Experiment 5.2
close('all'); clear; clc;
m = [-10 -10 10 10; 
     -10 10 -10 10];
m_test = [-1 -1 1 1;
          -1 1 -1 1];
s1 = 0.2 * [1 0;0 1]; s2 = 3 * [1 0;0 1];
P = [0.25 0.25 0.25 0.25]';
N = 400; % 100 points each class, 4 class
%% the Generated 400 Vectors in X1, X1_test, X2, X2_test
randn('seed', 0); % reproducible
for i = 1:size(m, 2) 
    if i == 1
        X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';
        X1_test = mvnrnd(m_test(:, i), s1, fix(P(i)*N))';
        X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';
        X2_test = mvnrnd(m_test(:, i), s2, fix(P(i)*N))';
    else
        X1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];
        X1_test = [X1_test, mvnrnd(m_test(:, i), s1, fix(P(1)*N))'];
        X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];
        X2_test = [X2_test, mvnrnd(m_test(:, i), s2, fix(P(1)*N))'];
    end
end
y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N)), 3 * ones(1, fix((P(3)*N))), 4 * ones(1, fix((P(4)*N)))];
y1_test = y1;
y2 = y1; y2_test = y2;
%% Plot all Situations for 4 Classes
figure; plot_point(X1, y1); 
title('Classes are Far away, $${\Sigma = 0.2 I}$$','Interpreter','latex');
figure; plot_point(X1_test, y2_test); 
title('Classes are Close, $${\Sigma = 0.2 I}$$','Interpreter','latex');
figure; plot_point(X2, y2); 
title('Classes are Far away, $${\Sigma = 3 I}$$','Interpreter','latex');
figure; plot_point(X2_test, y2_test); 
title('Classes are Close, $${\Sigma = 3 I}$$','Interpreter','latex');
%% Calculate S_w, S_b, S_m, J3 = trace(S_w \ S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1_test, y1_test )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 )
J_3 = J3(S_w, S_m)
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2_test, y2_test )
J_3 = J3(S_w, S_m)

experiment5_2_1

experiment5_2_2

experiment5_2_3

experiment5_2_4

result

% m = [-10 -10 10 10; 
%      -10 10 -10 10];
% Sigma = 0.2 I
S_w =
    0.2070    0.0046
    0.0046    0.2145
S_b =
   99.8278   -0.2653
   -0.2653  100.0591
S_m =
  100.0348   -0.2607
   -0.2607  100.2736
J_3 =  951.3471
% m_test = [-1 -1 1 1;
%           -1 1 -1 1];
% Sigma = 0.2 I
S_w =
    0.2042    0.0035
    0.0035    0.1999
S_b =
    1.0225   -0.0537
   -0.0537    0.9842
S_m =
    1.2266   -0.0502
   -0.0502    1.1841
J_3 =   11.9440
% m = [-10 -10 10 10; 
%      -10 10 -10 10];
% Sigma = 3 I
S_w =
    3.0501   -0.0759
   -0.0759    3.1290
S_b =
   99.7179   -0.4153
   -0.4153  100.9610
S_m =
  102.7680   -0.4912
   -0.4912  104.0900
J_3 =   66.9920
% m_test = [-1 -1 1 1;
%           -1 1 -1 1];
% Sigma = 3 I
S_w =
    2.8682   -0.0492
   -0.0492    2.9545
S_b =
    1.1316    0.0025
    0.0025    1.1283
S_m =
    3.9997   -0.0466
   -0.0466    4.0828
J_3 =    2.7766

conclusion

When $\Sigma$ of classes are the same:
$S_w \approx \Sigma\\$
When classes are Far away $\Rightarrow$ $trace(S_b)$ is big.
$trace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2, \quad where \quad \mu_0 = \frac{1}{N} \sum_{i=1}^{M} n_i \mu_i$

For $J3 = trace(S_w^{-1} S_m)$ :

Relationship of $\sigma, trace(s_b)$ and $J 3$ :
$\Rightarrow \Sigma = \sigma I\\ also \quad S_w \approx \Sigma = \sigma I\\ J3 = trace(S_w^{-1}S_m) \approx trace(\frac{1}{\sigma}I S_m) = \frac{trace(S_m)}{\sigma}\\ = \frac{trace(S_w + S_b)}{\sigma} = \frac{trace(S_w) + trace(S_b)}{\sigma} \approx \frac{\sigma l + trace(S_b)}{\sigma} = l + \frac{trace(S_b)}{\sigma}$

	$10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2$	$[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2$
$J 3$	951.3471	11.9440
$l+\frac{trace(S_b)}{\sigma}$	1001.4	12.0335
	$10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3$	$[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3$
$J 3$	66.9920	2.7766
$+\frac{trace(S_b)} {\sigma}$	68.8930	2.7533

How $J 3$ changes:

Classes are Far away:
$\Rightarrow (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow trace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow$
Features within every class are Close:
$\Rightarrow \sigma \downarrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow$

Experiment 5.4

FDR: Fisher’s discriminant ratio

Theory of Fisher’s Linear Discriminant

LDA_1
LDA_2
LDA_3
LDA_4
LDA_5

Columns of $w$ are corresponding to the Highest Eigen Values of $S_W^{-1}S_B$

code: experiment5_4.m

%% Computer Experiment 5.4
close('all'); clear; clc;
m = [2 2.5; 
     4 10];
s1 = [1 0;0 1]; s2 = 0.25 * [1 0;0 1];
P = [0.5 0.5]';
N = 200; % 100 points each class, 2 class
%% the Generated 200 Vectors in X1, X1_test, X2, X2_test
randn('seed', 0); % reproducible
for i = 1:size(m, 2) 
    if i == 1
        X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';
        X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';
    else
        X1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];
        X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];
    end
end
y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N))];
y2 = y1; 
%% Get FDR value, w (line projected on) of X1, X2
[ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 )
[ FDR_1, w_1 ] = FDR( X1, y1 , 1) %  set D_y: Dimension of w = 1
S_w \ [m(:, 1)-m(:, 2)]
ans = ans / sqrt(sum(ans.^2))
[ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 );
[ FDR_2, w_2 ] = FDR( X2, y2 , 1)
S_w \ [m(:, 1)-m(:, 2)]
ans = ans / sqrt(sum(ans.^2))
%% Plot all Situations for 4 Classes
f1 = figure; plot_point(X1, y1); 
hold on; h1 = ezplot(@(x, y) w_1(1)*x + w_1(2)*y);
hold on; H1 = ezplot(@(x, y) w_1(2)*x - w_1(1)*y, [-2 8 0 15]);
f2 = figure; plot_point(X2, y2); 
hold on; h2 = ezplot(@(x, y) w_2(1)*x + w_2(2)*y);
hold on; H2 = ezplot(@(x, y) w_2(2)*x - w_2(1)*y, [-4 6 0 12]);
figure(f1); title('$${\Sigma = I}$$','Interpreter','latex');
set(h1,'Color','r'); set(H1,'Color','g');
legend([h1 H1], 'y = wTx', 'line to be projected on', 'Location', 'Best')
figure(f2); title('$${\Sigma = 0.25 I}$$','Interpreter','latex');
set(h2,'Color','r'); set(H2,'Color','g');
legend([h2 H2], 'y = wTx', 'line to be projected on', 'Location', 'NorthWest')

experiment5_4_1
experiment5_4_2

result

% Sigma = I
FDR_1 =    8.1336
w_1 =
   -0.1118
   -0.9937
S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes
   -0.6136
   -5.5634
normalized S_w \ [m(:, 1)-m(:, 2)] =
   -0.1096
   -0.9940
% Sigma = 0.25 I
FDR_2 =   38.4050
w_2 =
    0.0075
   -1.0000
S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes
   -0.0019
  -25.9308
normalized S_w \ [m(:, 1)-m(:, 2)] =
   -0.0001
   -1.0000

conclusion

Theoretically, for 2 classes:

When $J (w)$ is maximized,
$\quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad w = J(w) (S_w^{-1}S_b) w, \quad J(w) = k_1\\ Also \quad S_b = (m_1 - m_2)(m_1 - m_2)^T, \quad S_b w = ((m_1- m_2)^T w) (m_1 - m_2), \quad ((m_1- m_2)^T w) = k_2\\ \Rightarrow \quad w = k_1 k_2 S_w^{-1}(m_1 - m_2)$
So $w$ has the same direction with $S_w^{-1}(m_1 - m_2)$

Actually:

When $\Sigma = I$ , covariance is big enough, $w$ and $S_w^{-1}(m_1 - m_2)$ are in the same direction.

When $\Sigma = 0.25 I$ , covariance is too small, there is big difference between directions of $w$ and $S_w^{-1}(m_1 - m_2)$ .

	$w$	$S_w^{-1}(m_1 - m_2)$	normalized $S_w^{-1}(m_1 - m_2)$	$J (w)$
$\Sigma = I$	-0.1118; -0.9937	-0.6136; -5.5634	-0.1096; -0.9940	8.1336
$\Sigma = 0.25I$	0.0075; -1.0000	-0.0019; -25.9308	-0.0001; -1.0000	38.4050

Theoretically:
$\quad S_w \approx \Sigma = \sigma I, \quad Restraint \ condition: w^T w = 1, \quad Max\ eigen\ value\ of\ S_b: \Lambda_{max}\\ Differentiating \quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad eigen\ value\ problem: J(w)w=S_w^{-1}S_b w \approx \frac{S_b}{\sigma} w\\ For\ 2\ classes: J(w) \approx \lambda_{max-\sigma} \quad max\ eigen\ value\ of \quad \frac{S_b}{\sigma}\\ J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma}$
When Distance between centers is not changed:
$\Rightarrow (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow S_b =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow \Lambda_{max} = const\\ \Rightarrow J(w)\sigma \approx \Lambda_{max} = const$
$\sigma$ of classes is big $\Rightarrow $ FDR: Fisher’s discriminant ratio is small.
$\sigma \uparrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \downarrow$
$\sigma$ of classes is small $\Rightarrow $ FDR: Fisher’s discriminant ratio is big.
$\sigma \downarrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \uparrow$
Actually:

	$\sigma$	$J (w)$	$\sigma J(w)$	$\Lambda_{max}$	$\frac{\sigma J(w)}{\Lambda_{max}} \times 100\%$
$\Sigma = I$	1	8.1336	8.1336	8.7515	92.94%
$\Sigma = 0.25I$	0.25	38.4050	9.6013	8.9344	107.46%

dassein

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
homework5_ZhankunLuo

Zhankun LuoPUID: 0031195279Email: luo333@pnw.eduFall-2018-ECE-59500-009Instructor: Toma Hentea文章目录Homework 5Functionplot_point.mCalc_SwSbSm.mJ3.mFDR.mProblemProblem 5.1Proof of $E[z] = \lVert \m...
复制链接

扫一扫