homework3_ZhankunLuo

最新推荐文章于 2020-03-02 14:39:08 发布

dassein

最新推荐文章于 2020-03-02 14:39:08 发布

阅读量322

点赞数

分类专栏： pattern recognition 文章标签： pattern recognition homework

本文链接：https://blog.csdn.net/dassein/article/details/83821423

版权

pattern recognition 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Zhankun Luo

PUID: 0031195279

Email: luo333@pnw.edu

Fall-2018-ECE-59500-009

Instructor: Toma Hentea

Homework 3

Function

Apart from these functions below, having used plot_data.m to draw points generated.

perceptron.m

With learning rate = rho / iter (for the convergence of algorithm)

Setting Cost Function: $\sum\limits_{x \in Y} (\delta_x w^T x)$

Where $Y$ is the subset of the vectors wrongly classified by $w$

$\delta_x=\begin{cases}-1& {x \in Y} \cap {x \in \omega_1}\\+1& {x \in Y} \cap {x \in \omega_2}\end{cases}$

$\rho_t\frac{\partial J(w)}{\partial w} =w(t) - \rho_t \sum \limits_{x \in Y}{\delta_x x}$

Where setting Learning Rate: $\rho_t = \frac{\rho}{t}$ (learning rate = rho / iter)

function [w_best, iter_best, mis_clas_min] = perceptron(X, y, w_ini, rho)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
%  [w_best, iter_best, mis_clas_min] = perceptron(X, y, w_ini, rho)
% NOTE: the learning rate = rho / iter
% INPUT ARGUMENTS:
%  X:       lxN dimensional matrix whose columns are the data vectors to
%           be classfied.
%  y:       N-dimensional vector whose i-th  component contains the label
%           of the class where the i-th data vector belongs (+1 or -1).
%  w_ini:   l-dimensional vector which is the initial estimate of the
%           parameter vector that corresponds to the separating hyperplane.
%  rho:     the learning rate = rho / iter
% OUTPUT ARGUMENTS:
%  w_best:   the best estimate of the parameter vector.
%  iter_best:the number of iterations required for the convergence of the
%           algorithm.
%  mis_clas_min: number of misclassified data vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[l,N] = size(X);
max_iter = 1000;  % Maximum allowable number of iterations
w = w_ini;        % Initilaization of the parameter vector
iter = 1;         % Iteration counter
mis_clas = N;     % Number of misclassfied vectors
while(mis_clas > 0) && (iter < max_iter) 
    mis_clas = 0;
    for i = 1:N
        if((X(:,i)'*w)*y(i) < 0)
            mis_clas = mis_clas + 1;             % lr = rho / iter
            w = w + (rho / iter)*y(i)*X(:,i);  % Update w
        end    
    end
    if (iter == 1) || (mis_clas_min > mis_clas) % find best w
        iter_best = iter; w_best = w; mis_clas_min = mis_clas;
    end
    iter = iter + 1;
end

SSE.m

Setting Cost Function: $\frac{1}{2} (X^T w - y)^2$ , here $l\times N, w: l \times 1, y: N \times 1$

When $J(w)_{min} \Rightarrow \frac{\partial J(w)}{\partial w} =X(X^T w - y) = 0$

Thus, $w = (XX^T)^{-1}Xy$

function [w, cost_func, mis_clas] = SSE(X, y)
% FUNCTION
%  [w, cost_func, mis_clas] = SSE(X, y)
% INPUT ARGUMENTS:
%  X:       lxN matrix whose columns are the data vectors to
%           be classfied.
%  y:       N-dimensional vector whose i-th  component contains the
%           label of the class where the i-th data vector belongs (+1 or
%           -1).
% OUTPUT ARGUMENTS:
%  w:       the final estimate of the parameter vector.
%  cost_func: value of cost function = 0.5 * @sum(y - w'*X)^2
%  mis_clas: number of misclassified data vectors.
w = (X*X') \ (X*y');
[l,N] = size(X);
cost_func = 0.5 * (y - w'*X) * (y - w'*X)';  % calculate cost function
mis_clas = 0;  % calculate number of misclassified vectors
for i = 1:N
    if((X(:,i)' * w) * y(i) < 0)
        mis_clas = mis_clas + 1;
    end
end

LMS.m

With learning rate = rho / iter (for the convergence of algorithm)

Setting Cost Function: $\frac{1}{2} (X^T w - y)^2$ , here $l\times N, w: l \times 1, y: N \times 1$

Where $Y$ is the subset of the vectors wrongly classified by $w$

$\rho_t\frac{\partial J(w)}{\partial w} =w(t) - \rho_t X(X^T w - y)$

Having $X(X^T w - y) = \sum\limits_{i = 1}^{N} {X(:, i)[X(:,i)^T w - y(:,i)]}$

Where setting Learning Rate: $\rho_t = \frac{\rho}{t}$ (learning rate = rho / iter)

function [w_best, iter_best, cost_func, mis_clas_min] = LMS(X, y, w_ini, rho)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
%  [w_best, iter_best, cost_func, mis_clas_min] = LMS(X, y, w_ini, rho)
% NOTE: the learning rate = rho / iter
% INPUT ARGUMENTS:
%  X:       lxN matrix whose columns are the data vectors to
%           be classfied.
%  y:       N-dimensional vector whose i-th  component contains the
%           label of the class where the i-th data vector belongs (+1 or
%           -1).
%  w_ini:   l-dimensional vector, which is the initial estimate of the
%           parameter vector that corresponds to the separating hyperplane.
%  rho:     the learning rate = rho / iter
% OUTPUT ARGUMENTS:
%  w_best:       the best estimate of the parameter vector.
%  iter_best:    the number of iterations required for the convergence of the
%                algorithm.
%  cost_func: value of cost function = 0.5 * @sum(y - w'*X)^2
%  mis_clas_min: number of misclassified data vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[l,N] = size(X);
max_iter = 1000;  % Maximum allowable number of iterations
w = w_ini;        % Initilaization of the parameter vector
iter = 1;         % Iteration counter   
while (iter < max_iter)    
    mis_clas = 0;    
    gradi = zeros(l,1); % Computation of the "gradient" term
    for i = 1:N
        if((X(:,i)'*w)*y(i) < 0)
            mis_clas = mis_clas + 1; 
        end    
        gradi = gradi + (((X(:,i)'*w) - y(i)) * X(:,i));
    end
    cost = 0.5 * (y - w'*X) * (y - w'*X)'; % cost function = 0.5 * @sum(y - w'*X)^2
    if(iter == 1) || (mis_clas_min > mis_clas) || ((mis_clas_min == mis_clas) && (cost < cost_func)) % find best w
        iter_best = iter; w_best = w; mis_clas_min = mis_clas;
        cost_func = cost; 
    end
    iter = iter + 1;
    w = w - (rho / iter) * gradi; % Updating the parameter vector
end

Problem

Problem 3.5

Use the perceptron algorithm for 100 vectors (50 vectors each class)

With learning rate = 0.01 / iter (for the convergence of algorithm)

%% Problem 3.5
close('all'); clear; clc;
m = [1 0; 1 0];
s = 0.2 * [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 100; % 50 points each class
%% Use the classifiers designed to classify the generated 100 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
y(y == 2) = -1;  % mark points: class 1:y(i)=1; class 2:y(i)=-1
%% Run the perceptron algorithm for X with learning rate = 0.01 / iter
rho = 0.01; % Learning rate = 0.01 / iter
w_ini = [-1 -1 1]';  % - x - y + 1 = 0
X = [X; ones(1, N)];
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))])

problem3_5

result

$l i n e : 0.0048 x - 0.0045 y - 0.0062 = 0$ , using the perceptron algorithm

w =
    0.0048
    0.0045
   -0.0062
iter =    30
mis_clas =     4
error_rate =    0.0400

Problem 3.6

Use the LSM algorithm for 200 vectors (100 vectors each class)

With learning rate = 0.01 / iter (for the convergence of algorithm)

%% Problem 3.6
close('all'); clear; clc;
m = [1 0; 1 0];
s = 0.2 * [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 200; % 100 points each class
%% Use the classifiers designed to classify the generated 200 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
y(y == 2) = -1;  % mark points: class 1:y(i)=1; class 2:y(i)=-1
%% Run the LSM algorithm for X with learning rate = 0.01 / iter
rho = 0.01; % Learning rate = 0.01 / iter
w_ini = [-1 -1 1]';  % - x - y + 1 = 0
X = [X; ones(1, N)];
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))])

problem3_6

result

$l i n e : 0.6394 x + 0.6127 y - 0.6032 = 0$ , using the LSM algorithm

w =
    0.6394
    0.6127
   -0.6032
iter =    48
cost_func =   30.6572
mis_clas =    12
error_rate =    0.0600

Computer Experiment

Experiment 3.1

Using perceptron, SSE, LMS algorithms for 400 vectors (200 vectors each class)

class 1: $\mu_1 = [-5, 0]$

class 2: $\mu_2 = [5, 0]$

With learning rate = 0.002 / iter

%% Experiment 3.1
close('all'); clear; clc;
m = [-5 5; 0 0];
% m = [-2 2; 0 0]; % for computer experiment 3.2
s = [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 400; % 200 points each class, 2 class
%% Plot the generated 400 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))]; 
figure(1); plot_data(X, y, m);
%% Preprocess data
y(y == 2) = -1;  % mark points: class 1:y(i)=1; class 2:y(i)=-1
rho = 0.002; % Learning rate = rho / iter
w_ini = [-1 -1 1]';  % - x - y + 1 = 0
X = [X; ones(1, N)];
%% Run the Perceptron algorithm with learning rate = 0.002 / iter
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h1 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the Sum of Error Squares classifier
[w, cost_func, mis_clas] = SSE(X, y)
error_rate = mis_clas / N
hold on; h2 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the LMS algorithm with learning rate = 0.002 / iter
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h3 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Legend of generated lines
set(h1,'Color','r'); set(h2,'Color','g'); set(h3,'Color','b');
legend([h1 h2 h3],'Perceptron','SSE','LMS');

result

To set 3 different initial initial values for the parameter vector:

$- x - y + 1 = 0$
$\ \ \ \ \ \ \ + 1 = 0$
$\ \ \ \ \ \ - y + 1 = 0$

w_ini = [-1 -1 1]’, - x - y + 1 = 0

experiment3_1_1

% the Perceptron algorithm
w =
    -1
    -1
     1
iter =     1
mis_clas =     0
error_rate =     0
% the Sum of Error Squares classifier
w =
   -0.1932
    0.0087
    0.0047
cost_func =    7.7261
mis_clas =     0
error_rate =     0
% the LMS algorithm
w =
   -0.1932
    0.0053
    0.0088
iter =   999
cost_func =    7.7319
mis_clas =     0
error_rate =     0

w_ini = [-1 0 1]’, - x + 1 = 0

experiment3_1_2

% the Perceptron algorithm
w =
    -1
     0
     1
iter =     1
mis_clas =     0
error_rate =     0
% the Sum of Error Squares classifier
w =
   -0.1932
    0.0087
    0.0047
cost_func =    7.7261
mis_clas =     0
error_rate =     0
% the LMS algorithm
w =
   -0.1932
    0.0088
    0.0090
iter =   999
cost_func =    7.7298
mis_clas =     0
error_rate =     0

w_ini = [0 -1 1]’, - y + 1 = 0

experiment3_1_3

% the Perceptron algorithm
w =
   -0.6331
   -0.7880
    0.9208
iter =    66
mis_clas =     1
error_rate =    0.0025
% the Sum of Error Squares classifier
w =
   -0.1932
    0.0087
    0.0047
cost_func =    7.7261
mis_clas =     0
error_rate =     0
% the LMS algorithm
w =
   -0.1932
    0.0053
    0.0088
iter =   999
cost_func =    7.7318
mis_clas =     0
error_rate =     0

conclusion

Because the goal of SSE and LMS is to make the same Cost Function: $\frac{1}{2} (X^T w - y)^2$ minimum.

The $w$ gotten using SSE and LMS are almost the same, where Cost Function: $J(w)_{min}$
The $w$ gotten using SSE and LMS don’t depend on initial initial values for w: $w_{ini}$ ,

while $w$ gotten using Perceptron algorithm largely depend on initial initial values for w: $w_{ini}$
Lines gotten using SSE and LMS are always in the middle of 2 classes.
When 2 classes are far enough:

Perceptron, SSE and LMS all can divide 2 classes successfully.

Experiment 3.2

Using perceptron, SSE, LMS algorithms for 400 vectors (200 vectors each class)

class 1: $\mu_1 = [-2, 0]$

class 2: $\mu_2 = [2, 0]$

With learning rate = 0.002 / iter

%% Experiment 3.2
close('all'); clear; clc;
m = [-2 2; 0 0]; % for computer experiment 3.2
s = [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 400; % 200 points each class, 2 class
%% Plot the generated 400 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))]; 
figure(1); plot_data(X, y, m);
%% Preprocess data
y(y == 2) = -1;  % mark points: class 1:y(i)=1; class 2:y(i)=-1
rho = 0.002; % Learning rate = rho / iter
w_ini = [-1 -1 1]';  % - x - y + 1 = 0
X = [X; ones(1, N)];
%% Run the Perceptron algorithm with learning rate = 0.002 / iter
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h1 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the Sum of Error Squares classifier
[w, cost_func, mis_clas] = SSE(X, y)
error_rate = mis_clas / N
hold on; h2 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the LMS algorithm with learning rate = 0.002 / iter
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h3 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Legend of generated lines
set(h1,'Color','r'); set(h2,'Color','g'); set(h3,'Color','b');
legend([h1 h2 h3],'Perceptron','SSE','LMS');

result

To set 3 different initial initial values for the parameter vector:

$- x - y + 1 = 0$
$\ \ \ \ \ \ \ + 1 = 0$
$\ \ \ \ \ \ - y + 1 = 0$

w_ini = [-1 -1 1]’, - x - y + 1 = 0

experiment3_2_1

% the Perceptron algorithm
w =
   -1.2874
   -0.4774
    0.6781
iter =   969
mis_clas =    17
error_rate =    0.0425
% the Sum of Error Squares classifier
w =
   -0.4031
    0.0238
    0.0099
cost_func =   40.5957
mis_clas =     4
error_rate =    0.0100
% the LMS algorithm
w =
   -0.4032
    0.0204
    0.0140
iter =   999
cost_func =   40.6015
mis_clas =     4
error_rate =    0.0100

w_ini = [-1 0 1]’, - x + 1 = 0

experiment3_2_2

% the Perceptron algorithm
w =
   -1.1239
    0.0447
    0.6948
iter =   677
mis_clas =    13
error_rate =    0.0325
% the Sum of Error Squares classifier
w =
   -0.4031
    0.0238
    0.0099
cost_func =   40.5957
mis_clas =     4
error_rate =    0.0100
% the LMS algorithm
w =
   -0.4033
    0.0239
    0.0441
iter =    75
cost_func =   40.8294
mis_clas =     3
error_rate =    0.0075

w_ini = [0 -1 1]’, - y + 1 = 0

experiment3_2_3

% the Perceptron algorithm
w =
   -0.8715
   -0.2862
    0.4784
iter =   671
mis_clas =    18
error_rate =    0.0450
% the Sum of Error Squares classifier
w =
   -0.4031
    0.0238
    0.0099
cost_func =   40.5957
mis_clas =     4
error_rate =    0.0100
% the LMS algorithm
w =
   -0.4032
    0.0205
    0.0140
iter =   999
cost_func =   40.6014
mis_clas =     4
error_rate =    0.0100

conclusion

Because the goal of SSE and LMS is to make the same Cost Function: $\frac{1}{2} (X^T w - y)^2$ minimum.

The $w$ gotten using SSE and LMS are almost the same, where Cost Function: $J(w)_{min}$
The $w$ gotten using SSE and LMS don’t depend on initial initial values for w: $w_{ini}$ ,

while $w$ gotten using Perceptron algorithm largely depend on initial initial values for w: $w_{ini}$
Lines gotten using SSE and LMS are always in the middle of 2 classes.
When 2 classes are too close, always having:

Error Rate: Perceptron > SSE $\approx$ LMS

dassein

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
homework3_ZhankunLuo

Zhankun LuoPUID: 0031195279Email: luo333@pnw.eduFall-2018-ECE-59500-009Instructor: Toma HenteaHomework 3文章目录Homework 3Functionperceptron.mSSE.mLMS.mProblemProblem 3.5resultProblem 3.6resultCompu...
复制链接

扫一扫