【国科大模式识别】第三次作业

最新推荐文章于 2024-02-02 13:14:16 发布

果壳小旋子

最新推荐文章于 2024-02-02 13:14:16 发布

阅读量1.6k

点赞数 4

文章标签：算法 python

本文链接：https://blog.csdn.net/m0_47867419/article/details/128539176

版权

文章详细介绍了二维空间中四个样本的线性判别函数求解过程，采用规范化增广样本和梯度更新方法更新权向量，确保样本被正确分类。同时，讨论了多类分类中的one-vs-all策略，通过构建多个线性判别函数确定决策边界，并通过编程实现批量感知算法和Ho-Kashyap算法，分析了不同训练数据集的分类效果。

摘要由CSDN通过智能技术生成

【题目一】现有四个来自于两个类别的二维空间中的样本, 其中第一类的两个样本为 $1,4)^T$ 和 $2,3)^T$ , 第二类的两个样本为 $4,1)^T$ 和 $3,2)^T$ 。这里, 上标 $T$ 表示向量转置。若采用规范化增广样本表示形式, 并假设初始的权向量 $\mathbf{a}=(0,1,0)^T$ , 其中向量 $\mathbf{a}$ 的第三维对应于样本的齐次坐标。同时, 假定梯度更新步长 $\eta_k$ 固定为 1 。试利用批处理感知准则函数方法求解线性判别函数 $g(\mathbf{y})=\mathbf{a}^T \mathbf{y}$ 的权向量 $\mathbf{a}$ 。(注: “规范化增广样本表示” 是指对齐次坐标表示的样本进行规范化处理。

在这里插入图片描述

【解】第一类的样本规范化后为: $x_1=[1,4,1]$ 和 $x_2=[2,3,1]$ ; 第二类的样本规范化后为: $x_3=[-4,-1,-1]$ 和 $x_4=[-3,-2,-1]$ , 初始化的权重向量为 $[0, 1, 0]$ , 用权重向量来判别所有样本的类别, 结果如下: $\begin{gathered} x_1:[0,1,0][1,4,1]^T=4>0 \\ x_2:[0,1,0][2,3,1]^T=3>0 \\ x_3:[0,1,0][-4,-1,-1]^T=-1<0 \\ x_4:[0,1,0][-3,-2,-1]^T=-2<0 \end{gathered}$ 希望判别结果全为正, 所以 $x_3$ 和 $x_4$ 被错分, 更新步长为 1 时, 权向量更新为: $[0, 1, 0] + [- 4, - 1, - 1] + [- 3, - 2, - 1] = [- 7, - 2, - 2]$ 计算是否都正确分类
$\begin{gathered} x_1:[-7,-2,-2][1,4,1]^T=-17<0 \\ x_2:[-7,-2,-2][2,3,1]^T=-22<0 \\ x_3:[-7,-2,-2][-4,-1,-1]^T=32>0 \\ x_4:[-7,-2,-2][-3,-2,-1]^T=27>0 \end{gathered}$ 希望判别结果全为正, 所以 $x_1$ 和 $x_2$ 被错分, 更新步长为 1 时, 权向量更新为: $[- 7, - 2, - 2] + [1, 4, 1] + [2, 3, 1] = [- 4, 5, 0]$ 计算是否正确分类 $\begin{gathered} x_1:[-4,5,0][1,4,1]^T=16>0 \\ x_2:[-4,5,0][2,3,1]^T=7>0\\ x_3:[-4,5,0][-4,-1,-1]^T=11>0 \\ x_4:[-4,5,0][-3,-2,-1]^T=2>0 \end{gathered}$ 于是得到最终的权向量 $[- 4, 5, 0]$

【题目二】对于多类分类情形, 考虑 one-vs-all 技巧, 即构建 $c$ 个线性判别函数: $g_i(\mathbf{x})=\mathbf{w}_i^T \mathbf{x}+w_{i 0}, \quad i=1,2, \ldots, c$ 此时的决策规则为: 对 $\neq i$ , 如果 $g_i(\mathbf{x})>g_j(\mathbf{x}), \mathbf{x}$ 则被分为 $\omega_i$ 类。现有三个二维空间内的模式分类器, 其判别函数为: $\begin{aligned} & g_1(\mathbf{x})=-x_1+x_2 \\ & g_2(\mathbf{x})=x_1+x_2-1 \\ & g_3(\mathbf{x})=-x_2 \end{aligned}$ 试画出决策面, 指出为何此时不存在分类不确定性区域。

【解】根据决策规则, 属于 $\omega$ 的区域应该满足 $g_1(x)>g_2(x)$ 且 $g_1(x)>g_3(x)$ , 所以 $\omega_1$ 的决策边界为:
$\begin{gathered} g_1(x)-g_2(x)=-2 x_1+1=0 \\ g_1(x)-g_3(x)=-x_1+2 x_2=0 \end{gathered}$
还有一条分界线 $g_2(x)-g_3(x)=x_1+2x_2-1=0$ 由于决策边界交于一点 $(0.5, 0.25)$ , 因此, 不存在不确定区域

clc;
close all;
clear;
plot(0.5*ones(1,100),linspace(0.25,5,100));
hold on;
x1 = linspace(-5,0.5,100);
plot(x1,1/2*x1);
hold on;
x2 = linspace(0.5,5,100);
plot(x2,(-x2+1)/2);
t = text(-3,3,'{\omega_1}');
t.FontSize = 24;
t1 = text(3,3,'{\omega_2}');
t1.FontSize = 24;
t2 = text(0.5,-2,'{\omega_3}');
t2.FontSize = 24;

在这里插入图片描述
【编程题】（下面瞎做的,大佬轻点喷）

Write a program to implement the “batch perception” algorithm.
(a). Starting with $\mathbf{a}=\mathbf{0}$ , apply your program to the training data from $\omega_1$ and $\omega_2$ . Note that the number of iterations required for convergence（即记录下收敛的步数)。
(b). Apply your program to the training data from $\omega_3$ and $\omega_2$ . Again, note that the number of iterations required for convergence.

clc;
close all;
clear;
trainset1 = [0.1, 1.1, 1; 6.8, 7.1, 1; -3.5, -4.1, 1;
             2.0, 2.7, 1; 4.1, 2.8, 1; 3.1, 5.0, 1;-0.8, -1.3, 1;
             0.9, 1.2, 1; 5.0, 6.4, 1; 3.9, 4.0, 1;-7.1, -4.2, -1;
             1.4, 4.3, -1; -4.5, -0.0, -1;-6.3, -1.6, -1;-4.2, -1.9, -1;-1.4, 3.2, -1;
             -2.4, 4.0, -1;-2.5, 6.1, -1;-8.4, -3.7, -1;-4.1, -2.2, -1];  
trainset2 = [-7.1, -4.2, -1;
             1.4, 4.3, -1; -4.5, -0.0, -1;-6.3, -1.6, -1;-4.2, -1.9, -1;-1.4, 3.2, -1;
             -2.4, 4.0, -1;-2.5, 6.1, -1;-8.4, -3.7, -1;-4.1, -2.2, -1 ;-3.0, -2.9, 1;0.5, 8.7, 1;2.9, 2.1, 1;
             -0.1, 5.2, 1;-4.0, 2.2, 1;-1.3, 3.7, 1;-3.4, 6.2, 1;-4.1, 3.4, 1;
             -5.1, 1.6, 1;1.9, 5.1, 1];
omega_1 = [0,0,0];
omega_2 = [0,0,0];
learning_rate = 0.01;


iteration1 = 0;
while iteration1<=1001
    iteration1 = iteration1 + 1;
    if sum(omega_1*trainset1'>0) == 20
        %print('迭代次数为',num2str(iteration1))
        %print('权重为',num2str(omega_1))
        iteration1
        omega_1
        break
    else
        omega_1 = omega_1 + sum(trainset1(omega_1*trainset1' <= 0,:)*learning_rate);
    end
    if iteration1 == 1000
        print('迭代次数已达最大1000')
        omega1
    end
end
   

iteration2 = 0;
while iteration2<=1001
    iteration2 = iteration2 + 1;
    if sum(omega_2*trainset2'>0) == 20
        %print('迭代次数为',num2str(iteration1))
        %print('权重为',num2str(omega_1))
        iteration2
        omega_2
        break
    else
        omega_2 = omega_2 + sum(trainset2(omega_2*trainset2' <= 0,:)*learning_rate);
    end
    if iteration2 == 1000
        print('迭代次数已达最大1000')
        omega2
    end
end

在这里插入图片描述

Implement the Ho-Kashyap algorithm and apply it to the training data from $\omega_1$ and $\omega_3$ . Repeat to apply it to the training data from $\omega_2$ and $\omega_4$ . Point out the training errors, and give some analyses.

clc;
close all;
clear;
a = [0,0,0]';   % 初始权重
b = ones(20,1)*0.01;  % 初始margin
bmin = ones(20,1)*0.001; % 误差阈值
Y1 = [0.1 1.1 1;
    6.8 7.1 1;
    -3.5 -4.1 1;
    2.0 2.7 1;
    4.1 2.8 1;
    3.1 5.0 1;
    -0.8 -1.3 1;
    0.9 1.2 1;
    5.0 6.4 1;
    3.9 4.0 1;
    3.0 2.9 -1;
    -0.5 -8.7 -1;
    -2.9 -2.1 -1;
    0.1 -5.2 -1;
    4.0 -2.2 -1;
    1.3 -3.7 -1;
    3.4 -6.2 -1;
    4.1 -3.4 -1;
    5.1 -1.6 -1;
    -1.9 -5.1 -1];
Y2 = [7.1 4.2 1;
    -1.4 -4.3 1;
    4.5 0.0 1;
    6.3 1.6 1;
    4.2 1.9 1;
    1.4 -3.2 1;
    2.4 -4.0 1;
    2.5 -6.1 1;
    8.4 3.7 1;
    4.1 -2.2 1;
    2.0 8.4 -1;
    8.9 -0.2 -1;
    4.2 7.7 -1;
    8.5 3.2 -1;
    6.7 4.0 -1;
    0.5 9.2 -1;
    5.3 6.7 -1;
    8.7 6.4 -1;
    7.1 9.7 -1;
    8.0 6.3 -1];
kmax = 100000;    % 最大迭代次数
learning_rate = 0.01;
iterations = 0;  % 迭代次数
e = [1 1 1]';  % error


%======================%
while 1
    e = Y1*a-b;
    e_plus = 1/2*(e + abs(e));
    b = b + 2*learning_rate*e_plus;
    a = (Y1'*Y1)\Y1'*b;
    iterations =iterations+1;
    if abs(e) <= bmin
        a
        b
        iterations
        break
    end
    
    if iterations == kmax
        disp('No solution found!')
        sprintf('迭代已达最大次数%d',kmax)
        disp('========================')
        break
    end
end

%======================%
while 1
    e = Y2*a-b;
    e_plus = 1/2*(e + abs(e));
    b = b + 2*learning_rate*e_plus;
    a = (Y2'*Y2)\Y2'*b;
    iterations =iterations+1;
    if abs(e) <= bmin
        a
        b
        iterations
        break
    end
    
    if iterations == kmax
        disp('No solution found!')
        sprintf('迭代已达最大次数%d',kmax)
        break
    end
    
end

输出结果：

No solution found!
ans =
    '迭代已达最大次数100000'
========================
a =
    0.0063
    0.0050
    0.0398
b =
    0.1056
    0.0105
    0.0682
    0.0875
    0.0758
    0.0326
    0.0349
    0.0250
    0.1113
    0.0546
    0.0149
    0.0156
    0.0252
    0.0298
    0.0224
    0.0100
    0.0272
    0.0471
    0.0535
    0.0422
iterations =
      122298

【分析】
在这里插入图片描述
由于1类和3类是线性不可分的，所以算法肯定是不收敛的。
画出上面这个分布图的代码：

clc;
close all;
clear;
omega1_x = [0.1,6.8,-3.5,2.0,4.1,3.1,-0.8,0.9,5.0,3.9];
omega1_y = [1.1,7.1,-4.1,2.7,2.8,5.0,-1.3,1.2,6.4,4.0];
omega2_x = [7.1,-1.4,4.5,6.3,4.2,1.4,2.4,2.5,8.4,4.1];
omega2_y = [4.2,-4.3,0.0,1.6,1.9,-3.2,-4.0,-6.1,3.7,-2.2];
omega3_x = [-3.0,0.5,2.9,-0.1,-4.0,-1.3,-3.4,-4.1,-5.1,1.9];
omega3_y = [-2.9,8.7,2.1,5.2,2.2,3.7,6.2,3.4,1.6,5.1];
omega4_x = [-2.0,-8.9,-4.2,-8.5,-6.7,-0.5,-5.3,-8.7,-7.1,-8.0];
omega4_y = [-8.4,0.2,-7.7,-3.2,-4.0,-9.2,-6.7,-6.4,-9.7,-6.3];
figure();
scatter(omega1_x,omega1_y,'filled');
hold on
scatter(omega2_x,omega2_y,'filled');
hold on
scatter(omega3_x,omega3_y,'filled');
hold on
scatter(omega4_x,omega4_y,'filled');
hold off
legend('\omega_1','\omega_2','\omega_3','\omega_4');

请写一个程序, 实现 MSE 多类扩展方法。每一类用前 8 个样本来构造分类器, 用后两个样本作测试。请写出主要计算步骤, 并给出你的正确率。

clc;
close all;
clear;
Y = [1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;
    0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;
    0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;
    0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;]';
X_hat = [0.1 1.1 1;6.8 7.1 1;-3.5 -4.1 1;2.0 2.7 1;4.1 2.8 1;3.1 5.0 1;-0.8 -1.3 1;0.9 1.2 1;
7.1 4.2 1;-1.4 -4.3 1;4.5 0.0 1;6.3 1.6 1;4.2 1.9 1;1.4 -3.2 1;2.4 -4.0 1;2.5 -6.1 1;-3.0 -2.9 1;
0.5 8.7 1;2.9 2.1 1;-0.1 5.2 1;-4.0 2.2 1;-1.3 3.7 1;-3.4 6.2 1;-4.1 3.4 1;-2.0 -8.4 1;-8.9 0.2 1;
-4.2 -7.7 1;-8.5 -3.2 1;-6.7 -4.0 1;-0.5 -9.2 1;-5.3 -6.7 1;-8.7 -6.4 1]';
W_hat = (X_hat*X_hat')\X_hat*Y';
X_test = [5.0 6.4 1;3.9 4.0 1;8.4 3.7 1;4.1 -2.2 1;
    -5.1 1.6 1;1.9 5.1 1;-7.1 -9.7 1; -8.0 -6.3 1]';
[a,b] = max(W_hat'*X_test);
b