Zhankun Luo
PUID: 0031195279
Email: luo333@pnw.edu
Fall-2018-ECE-59500-009
Instructor: Toma Hentea
Homework 3
Function
Apart from these functions below, having used plot_data.m to draw points generated.
perceptron.m
With learning rate = rho / iter (for the convergence of algorithm)
Setting Cost Function: J ( w ) = ∑ x ∈ Y ( δ x w T x ) J(w) = \sum\limits_{x \in Y} (\delta_x w^T x) J(w)=x∈Y∑(δxwTx)
Where Y Y Y is the subset of the vectors wrongly classified by w w w
δ x = { − 1 x ∈ Y ∩ x ∈ ω 1 + 1 x ∈ Y ∩ x ∈ ω 2 \delta_x=\begin{cases}-1& {x \in Y} \cap {x \in \omega_1}\\+1& {x \in Y} \cap {x \in \omega_2}\end{cases} δx={−1+1x∈Y∩x∈ω1x∈Y∩x∈ω2
w ( t + 1 ) = w ( t ) − ρ t ∂ J ( w ) ∂ w = w ( t ) − ρ t ∑ x ∈ Y δ x x w(t+1) = w(t) - \rho_t\frac{\partial J(w)}{\partial w} =w(t) - \rho_t \sum \limits_{x \in Y}{\delta_x x} w(t+1)=w(t)−ρt∂w∂J(w)=w(t)−ρtx∈Y∑δxx
Where setting Learning Rate: ρ t = ρ t \rho_t = \frac{\rho}{t} ρt=tρ (learning rate = rho / iter)
function [w_best, iter_best, mis_clas_min] = perceptron(X, y, w_ini, rho)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
% [w_best, iter_best, mis_clas_min] = perceptron(X, y, w_ini, rho)
% NOTE: the learning rate = rho / iter
% INPUT ARGUMENTS:
% X: lxN dimensional matrix whose columns are the data vectors to
% be classfied.
% y: N-dimensional vector whose i-th component contains the label
% of the class where the i-th data vector belongs (+1 or -1).
% w_ini: l-dimensional vector which is the initial estimate of the
% parameter vector that corresponds to the separating hyperplane.
% rho: the learning rate = rho / iter
% OUTPUT ARGUMENTS:
% w_best: the best estimate of the parameter vector.
% iter_best:the number of iterations required for the convergence of the
% algorithm.
% mis_clas_min: number of misclassified data vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[l,N] = size(X);
max_iter = 1000; % Maximum allowable number of iterations
w = w_ini; % Initilaization of the parameter vector
iter = 1; % Iteration counter
mis_clas = N; % Number of misclassfied vectors
while(mis_clas > 0) && (iter < max_iter)
mis_clas = 0;
for i = 1:N
if((X(:,i)'*w)*y(i) < 0)
mis_clas = mis_clas + 1; % lr = rho / iter
w = w + (rho / iter)*y(i)*X(:,i); % Update w
end
end
if (iter == 1) || (mis_clas_min > mis_clas) % find best w
iter_best = iter; w_best = w; mis_clas_min = mis_clas;
end
iter = iter + 1;
end
SSE.m
Setting Cost Function: J ( w ) = 1 2 ( X T w − y ) 2 J(w) = \frac{1}{2} (X^T w - y)^2 J(w)=21(XTw−y)2, here : X l × N , w : l × 1 , y : N × 1 :X l\times N, w: l \times 1, y: N \times 1 :Xl×N,w:l×1,y:N×1
When J ( w ) m i n ⇒ ∂ J ( w ) ∂ w = X ( X T w − y ) = 0 J(w)_{min} \Rightarrow \frac{\partial J(w)}{\partial w} =X(X^T w - y) = 0 J(w)min⇒∂w∂J(w)=X(XTw−y)=0
Thus, w = ( X X T ) − 1 X y w = (XX^T)^{-1}Xy w=(XXT)−1Xy
function [w, cost_func, mis_clas] = SSE(X, y)
% FUNCTION
% [w, cost_func, mis_clas] = SSE(X, y)
% INPUT ARGUMENTS:
% X: lxN matrix whose columns are the data vectors to
% be classfied.
% y: N-dimensional vector whose i-th component contains the
% label of the class where the i-th data vector belongs (+1 or
% -1).
% OUTPUT ARGUMENTS:
% w: the final estimate of the parameter vector.
% cost_func: value of cost function = 0.5 * @sum(y - w'*X)^2
% mis_clas: number of misclassified data vectors.
w = (X*X') \ (X*y');
[l,N] = size(X);
cost_func = 0.5 * (y - w'*X) * (y - w'*X)'; % calculate cost function
mis_clas = 0; % calculate number of misclassified vectors
for i = 1:N
if((X(:,i)' * w) * y(i) < 0)
mis_clas = mis_clas + 1;
end
end
LMS.m
With learning rate = rho / iter (for the convergence of algorithm)
Setting Cost Function: J ( w ) = 1 2 ( X T w − y ) 2 J(w) = \frac{1}{2} (X^T w - y)^2 J(w)=21(XTw−y)2, here X : l × N , w : l × 1 , y : N × 1 X: l\times N, w: l \times 1, y: N \times 1 X:l×N,w:l×1,y:N×1
Where Y Y Y is the subset of the vectors wrongly classified by w w w
w ( t + 1 ) = w ( t ) − ρ t ∂ J ( w ) ∂ w = w ( t ) − ρ t X ( X T w − y ) w(t+1) = w(t) - \rho_t\frac{\partial J(w)}{\partial w} =w(t) - \rho_t X(X^T w - y) w(t+1)=w(t)−ρt∂w∂J(w)=w(t)−ρtX(XTw−y)
Having X ( X T w − y ) = ∑ i = 1 N X ( : , i ) [ X ( : , i ) T w − y ( : , i ) ] X(X^T w - y) = \sum\limits_{i = 1}^{N} {X(:, i)[X(:,i)^T w - y(:,i)]} X(XTw−y)=i=1∑NX(:,i)[X(:,i)Tw−y(:,i)]
Where setting Learning Rate: ρ t = ρ t \rho_t = \frac{\rho}{t} ρt=tρ (learning rate = rho / iter)
function [w_best, iter_best, cost_func, mis_clas_min] = LMS(X, y, w_ini, rho)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
% [w_best, iter_best, cost_func, mis_clas_min] = LMS(X, y, w_ini, rho)
% NOTE: the learning rate = rho / iter
% INPUT ARGUMENTS:
% X: lxN matrix whose columns are the data vectors to
% be classfied.
% y: N-dimensional vector whose i-th component contains the
% label of the class where the i-th data vector belongs (+1 or
% -1).
% w_ini: l-dimensional vector, which is the initial estimate of the
% parameter vector that corresponds to the separating hyperplane.
% rho: the learning rate = rho / iter
% OUTPUT ARGUMENTS:
% w_best: the best estimate of the parameter vector.
% iter_best: the number of iterations required for the convergence of the
% algorithm.
% cost_func: value of cost function = 0.5 * @sum(y - w'*X)^2
% mis_clas_min: number of misclassified data vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[l,N] = size(X);
max_iter = 1000; % Maximum allowable number of iterations
w = w_ini; % Initilaization of the parameter vector
iter = 1; % Iteration counter
while (iter < max_iter)
mis_clas = 0;
gradi = zeros(l,1); % Computation of the "gradient" term
for i = 1:N
if((X(:,i)'*w)*y(i) < 0)
mis_clas = mis_clas + 1;
end
gradi = gradi + (((X(:,i)'*w) - y(i)) * X(:,i));
end
cost = 0.5 * (y - w'*X) * (y - w'*X)'; % cost function = 0.5 * @sum(y - w'*X)^2
if(iter == 1) || (mis_clas_min > mis_clas) || ((mis_clas_min == mis_clas) && (cost < cost_func)) % find best w
iter_best = iter; w_best = w; mis_clas_min = mis_clas;
cost_func = cost;
end
iter = iter + 1;
w = w - (rho / iter) * gradi; % Updating the parameter vector
end
Problem
Problem 3.5
Use the perceptron algorithm for 100 vectors (50 vectors each class)
With learning rate = 0.01 / iter (for the convergence of algorithm)
%% Problem 3.5
close('all'); clear; clc;
m = [1 0; 1 0];
s = 0.2 * [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 100; % 50 points each class
%% Use the classifiers designed to classify the generated 100 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
y(y == 2) = -1; % mark points: class 1:y(i)=1; class 2:y(i)=-1
%% Run the perceptron algorithm for X with learning rate = 0.01 / iter
rho = 0.01; % Learning rate = 0.01 / iter
w_ini = [-1 -1 1]'; % - x - y + 1 = 0
X = [X; ones(1, N)];
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))])
result
l i n e : 0.0048 x − 0.0045 y − 0.0062 = 0 line: 0.0048 x - 0.0045 y - 0.0062 = 0 line:0.0048x−0.0045y−0.0062=0, using the perceptron algorithm
w =
0.0048
0.0045
-0.0062
iter = 30
mis_clas = 4
error_rate = 0.0400
Problem 3.6
Use the LSM algorithm for 200 vectors (100 vectors each class)
With learning rate = 0.01 / iter (for the convergence of algorithm)
%% Problem 3.6
close('all'); clear; clc;
m = [1 0; 1 0];
s = 0.2 * [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 200; % 100 points each class
%% Use the classifiers designed to classify the generated 200 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
y(y == 2) = -1; % mark points: class 1:y(i)=1; class 2:y(i)=-1
%% Run the LSM algorithm for X with learning rate = 0.01 / iter
rho = 0.01; % Learning rate = 0.01 / iter
w_ini = [-1 -1 1]'; % - x - y + 1 = 0
X = [X; ones(1, N)];
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))])
result
l i n e : 0.6394 x + 0.6127 y − 0.6032 = 0 line: 0.6394 x + 0.6127 y - 0.6032 = 0 line:0.6394x+0.6127y−0.6032=0, using the LSM algorithm
w =
0.6394
0.6127
-0.6032
iter = 48
cost_func = 30.6572
mis_clas = 12
error_rate = 0.0600
Computer Experiment
Experiment 3.1
Using perceptron, SSE, LMS algorithms for 400 vectors (200 vectors each class)
class 1: μ 1 = [ − 5 , 0 ] \mu_1 = [-5, 0] μ1=[−5,0]
class 2: μ 2 = [ 5 , 0 ] \mu_2 = [5, 0] μ2=[5,0]
With learning rate = 0.002 / iter
%% Experiment 3.1
close('all'); clear; clc;
m = [-5 5; 0 0];
% m = [-2 2; 0 0]; % for computer experiment 3.2
s = [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 400; % 200 points each class, 2 class
%% Plot the generated 400 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
%% Preprocess data
y(y == 2) = -1; % mark points: class 1:y(i)=1; class 2:y(i)=-1
rho = 0.002; % Learning rate = rho / iter
w_ini = [-1 -1 1]'; % - x - y + 1 = 0
X = [X; ones(1, N)];
%% Run the Perceptron algorithm with learning rate = 0.002 / iter
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h1 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the Sum of Error Squares classifier
[w, cost_func, mis_clas] = SSE(X, y)
error_rate = mis_clas / N
hold on; h2 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the LMS algorithm with learning rate = 0.002 / iter
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h3 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Legend of generated lines
set(h1,'Color','r'); set(h2,'Color','g'); set(h3,'Color','b');
legend([h1 h2 h3],'Perceptron','SSE','LMS');
result
To set 3 different initial initial values for the parameter vector:
- − x − y + 1 = 0 - x - y + 1 = 0 −x−y+1=0
- − x + 1 = 0 - x \ \ \ \ \ \ \ + 1 = 0 −x +1=0
- − y + 1 = 0 \ \ \ \ \ \ - y + 1 = 0 −y+1=0
w_ini = [-1 -1 1]’, - x - y + 1 = 0
% the Perceptron algorithm
w =
-1
-1
1
iter = 1
mis_clas = 0
error_rate = 0
% the Sum of Error Squares classifier
w =
-0.1932
0.0087
0.0047
cost_func = 7.7261
mis_clas = 0
error_rate = 0
% the LMS algorithm
w =
-0.1932
0.0053
0.0088
iter = 999
cost_func = 7.7319
mis_clas = 0
error_rate = 0
w_ini = [-1 0 1]’, - x + 1 = 0
% the Perceptron algorithm
w =
-1
0
1
iter = 1
mis_clas = 0
error_rate = 0
% the Sum of Error Squares classifier
w =
-0.1932
0.0087
0.0047
cost_func = 7.7261
mis_clas = 0
error_rate = 0
% the LMS algorithm
w =
-0.1932
0.0088
0.0090
iter = 999
cost_func = 7.7298
mis_clas = 0
error_rate = 0
w_ini = [0 -1 1]’, - y + 1 = 0
% the Perceptron algorithm
w =
-0.6331
-0.7880
0.9208
iter = 66
mis_clas = 1
error_rate = 0.0025
% the Sum of Error Squares classifier
w =
-0.1932
0.0087
0.0047
cost_func = 7.7261
mis_clas = 0
error_rate = 0
% the LMS algorithm
w =
-0.1932
0.0053
0.0088
iter = 999
cost_func = 7.7318
mis_clas = 0
error_rate = 0
conclusion
-
Because the goal of SSE and LMS is to make the same Cost Function: J ( w ) = 1 2 ( X T w − y ) 2 J(w) = \frac{1}{2} (X^T w - y)^2 J(w)=21(XTw−y)2 minimum.
The w w w gotten using SSE and LMS are almost the same, where Cost Function: J ( w ) m i n J(w)_{min} J(w)min
-
The w w w gotten using SSE and LMS don’t depend on initial initial values for w: w i n i w_{ini} wini,
while w w w gotten using Perceptron algorithm largely depend on initial initial values for w: w i n i w_{ini} wini
-
Lines gotten using SSE and LMS are always in the middle of 2 classes.
-
When 2 classes are far enough:
Perceptron, SSE and LMS all can divide 2 classes successfully.
Experiment 3.2
Using perceptron, SSE, LMS algorithms for 400 vectors (200 vectors each class)
class 1: μ 1 = [ − 2 , 0 ] \mu_1 = [-2, 0] μ1=[−2,0]
class 2: μ 2 = [ 2 , 0 ] \mu_2 = [2, 0] μ2=[2,0]
With learning rate = 0.002 / iter
%% Experiment 3.2
close('all'); clear; clc;
m = [-2 2; 0 0]; % for computer experiment 3.2
s = [1 0;0 1]; S(:, :, 1) = s; S(:, :, 2) = s;
P = [0.5 0.5]';
N = 400; % 200 points each class, 2 class
%% Plot the generated 400 vectors
randn('seed',0); % reproducible
X1 = mvnrnd(m(:, 1), S(:, :, 1), fix(P(1)*N))'; X2 = mvnrnd(m(:, 2), S(:, :, 2), fix(P(2)*N))'; X = [X1 X2];
y = [ones(1, fix(P(1)*N)), 2*ones(1, fix(P(1)*N))];
figure(1); plot_data(X, y, m);
%% Preprocess data
y(y == 2) = -1; % mark points: class 1:y(i)=1; class 2:y(i)=-1
rho = 0.002; % Learning rate = rho / iter
w_ini = [-1 -1 1]'; % - x - y + 1 = 0
X = [X; ones(1, N)];
%% Run the Perceptron algorithm with learning rate = 0.002 / iter
[w, iter, mis_clas] = perceptron(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h1 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the Sum of Error Squares classifier
[w, cost_func, mis_clas] = SSE(X, y)
error_rate = mis_clas / N
hold on; h2 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Run the LMS algorithm with learning rate = 0.002 / iter
[w, iter, cost_func, mis_clas] = LMS(X, y, w_ini, rho)
error_rate = mis_clas / N
hold on; h3 = ezplot(@(x, y) w(1)*x + w(2)*y + w(3), [min(X(1,:)), max(X(1,:)), min(X(2,:)), max(X(2,:))]);
%% Legend of generated lines
set(h1,'Color','r'); set(h2,'Color','g'); set(h3,'Color','b');
legend([h1 h2 h3],'Perceptron','SSE','LMS');
result
To set 3 different initial initial values for the parameter vector:
- − x − y + 1 = 0 - x - y + 1 = 0 −x−y+1=0
- − x + 1 = 0 - x \ \ \ \ \ \ \ + 1 = 0 −x +1=0
- − y + 1 = 0 \ \ \ \ \ \ - y + 1 = 0 −y+1=0
w_ini = [-1 -1 1]’, - x - y + 1 = 0
% the Perceptron algorithm
w =
-1.2874
-0.4774
0.6781
iter = 969
mis_clas = 17
error_rate = 0.0425
% the Sum of Error Squares classifier
w =
-0.4031
0.0238
0.0099
cost_func = 40.5957
mis_clas = 4
error_rate = 0.0100
% the LMS algorithm
w =
-0.4032
0.0204
0.0140
iter = 999
cost_func = 40.6015
mis_clas = 4
error_rate = 0.0100
w_ini = [-1 0 1]’, - x + 1 = 0
% the Perceptron algorithm
w =
-1.1239
0.0447
0.6948
iter = 677
mis_clas = 13
error_rate = 0.0325
% the Sum of Error Squares classifier
w =
-0.4031
0.0238
0.0099
cost_func = 40.5957
mis_clas = 4
error_rate = 0.0100
% the LMS algorithm
w =
-0.4033
0.0239
0.0441
iter = 75
cost_func = 40.8294
mis_clas = 3
error_rate = 0.0075
w_ini = [0 -1 1]’, - y + 1 = 0
% the Perceptron algorithm
w =
-0.8715
-0.2862
0.4784
iter = 671
mis_clas = 18
error_rate = 0.0450
% the Sum of Error Squares classifier
w =
-0.4031
0.0238
0.0099
cost_func = 40.5957
mis_clas = 4
error_rate = 0.0100
% the LMS algorithm
w =
-0.4032
0.0205
0.0140
iter = 999
cost_func = 40.6014
mis_clas = 4
error_rate = 0.0100
conclusion
-
Because the goal of SSE and LMS is to make the same Cost Function: J ( w ) = 1 2 ( X T w − y ) 2 J(w) = \frac{1}{2} (X^T w - y)^2 J(w)=21(XTw−y)2 minimum.
The w w w gotten using SSE and LMS are almost the same, where Cost Function: J ( w ) m i n J(w)_{min} J(w)min
-
The w w w gotten using SSE and LMS don’t depend on initial initial values for w: w i n i w_{ini} wini,
while w w w gotten using Perceptron algorithm largely depend on initial initial values for w: w i n i w_{ini} wini
-
Lines gotten using SSE and LMS are always in the middle of 2 classes.
-
When 2 classes are too close, always having:
Error Rate: Perceptron > SSE ≈ \approx ≈ LMS