模式识别作业6实验报告
利用贝叶斯分类器实现手写数字识别
基本原理
- 对每个手写的数字样品,按bxb(如b=5)方式划分,共有b2份。
- 对每一份内的象素个数进行累加统计,除以每一份内的象素总数,设定阈值T=0.05(可改),若每一份内的象素占有率大于T则对应的特征值为1,否则为0。
- 先计算数字 i i i的先验概率 P ( ω i ) = N i N P(\omega_i)=\frac{N_i}{N} P(ωi)=NNi,其中 N i N_i Ni为数字 i i i的样品数, N N N为样品总数。
- 再计算 P j ( ω i ) = ∑ k = 0 , X ∈ ω i N i x k j + 1 N i + 2 P_j(\omega_i)=\frac{\sum_{k=0,X\in\omega_i}^{N_i}x_{kj}+1}{N_i+2} Pj(ωi)=Ni+2∑k=0,X∈ωiNixkj+1,再计算类条件概率 P ( X ∣ ω i ) = ∏ j = 0 24 P ( x j = α ∣ X ∈ ω i ) P(X|\omega_i)=\prod_{j=0}^{24}P(x_j=\alpha|X\in\omega_i) P(X∣ωi)=∏j=024P(xj=α∣X∈ωi),其中 P j ( ω i ) P_j(\omega_i) Pj(ωi)表示样品 X X X属于 ω i \omega_i ωi类条件下, X X X的第 j j j个分量为1的概率估计值, α = 0 o r 1 \alpha=0\,or\,1 α=0or1.
- 利用贝叶斯公式求后验概率 P ( ω i ∣ X ) = P ( ω i ) P ( X ∣ ω i ) ∑ k = 0 9 P ( ω k ) P ( X ∣ ω k ) , i = 0 , 1 , ⋯ , 9. P(\omega_i|X)=\frac{P(\omega_i)P(X|\omega_i)}{\sum_{k=0}^9P(\omega_k)P(X|\omega_k)},i=0,1,\cdots,9. P(ωi∣X)=∑k=09P(ωk)P(X∣ωk)P(ωi)P(X∣ωi),i=0,1,⋯,9.
- 后验概率的最大值的类别0-9就是手写数字的所属类别。
实验数据采集
从网上下载,来源:
https://blog.csdn.net/qq_25005311/article/details/97910815
有4000张手写数字图片,大小为28×28,数字0-9分别有400张。部分图片如下:
算法流程
算法的思维导图为:
实验结果分析
取不同的样本个数,设定不同的划分数b和阈值T,得到数字识别正确率的百分比如下两个表格:
表格1
样本总数\参数 | b=7,T=0.06 | b=7,T=1/16 | b=7,T=2/16 | b=7,T=3/16 | b=7,T=4/16 | b=7,T=5/16 |
---|---|---|---|---|---|---|
100 | 86% | 87% | 86% | 82% | 85% | 83% |
500 | 72.4% | 73.4% | 77.6% | 76.4% | 74.8% | 73.8% |
1000 | 70% | 71.7% | 74% | 75.1% | 74.2 | 75.2% |
2000 | 69.1% | 71.65% | 73.25% | 74.1% | 72.25% | 72.6% |
3000 | 69.53% | 71.37% | 73.33% | 74.33% | 72.97% | 73.77% |
4000 | 70.03% | 71.92% | 73.72% | 74.42% | 72.72% | 73.12% |
表格2
样本总数\参数 | b=7,T=6/16 | b=7,T=7/16 | b=7,T=8/16 | b=14,T=1/4 | b=14,T=2/4 | b=28,T=0 |
---|---|---|---|---|---|---|
100 | 85% | 84% | 81% | 95% | 92% | 97% |
500 | 76.6% | 77% | 70% | 85.2% | 83.6% | 85.6% |
1000 | 76.4 | 75.8% | 72.5% | 83.8% | 83.1% | 84.7% |
2000 | 73.75% | 73.6% | 69.95% | 81.35% | 80.65% | 82.45% |
3000 | 74.33% | 73.7% | 69.83% | 81.97% | 81.03% | 82.73% |
4000 | 73.52% | 72.9% | 69.67% | 81.7% | 80.65% | 82.27% |
上表可以看到,当b=28,T=0,即对图片不做分块处理,阈值为0时识别率最高,基本在82%以上。下图为样本总数等于100时的数字识别矩阵,可以看到,错误识别三个数字:
识别错误的图片分别为:
可以看到,即便肉眼很容易看出的数字,算法还是识别错了,说明Bayes分类器识别正确率不高。下图为样本总数等于200时的数字识别矩阵:
总结:Bayes分类器的优点和缺点为:
- 优点:贝叶斯算法高效,易于实现。
- 缺点:分类性能不一定很高,对于4000个样本精度才82%左右。改进之处:提高准确率。
程序代码
clear;clc;
tune_b = 28; % 需要调的参数1
tune_T = 0; % 需要调的参数2
A = [];N1=9; % 每个数字样本个数为N1+1,需要调的参数3
N = 10*(N1+1); % 样本总数为N
for i=0:9
B = [];
for j=0:N1
x = im2double(imread(strcat('D:\kp_matlab\hw6-number_recog\01\images4000\',...
num2str(i),'_', num2str(j),'.bmp')));
b = tune_b; % b*b方式划分,影响正确率
a = size(x,1)/b; % 每份的行数
a1 = a*ones(1,b);
x = mat2cell(x,a1,a1); % 将28*28的数值矩阵分块,分成b*b
x = reshape(x,b^2,1); % 将矩阵转化为列向量
C = [];
for k = 1:b^2
c = (cell2mat(x(k))~=0);
count = sum(c(:));
t = count/a^2;
T = tune_T; % 修改阈值提高正确率
if t>T
t=1;
else t=0;
end
C = [C;t]; % 构造特征指标列向量
end
B = [B,C]; % 拼接每个样本的指标列向量
end
A = cat(3,A,B); % 每个数字样本个数都相同,构造数字0-9的三维特征矩阵
end
P1 = (N1+1)/N; % 先验概率
for i = 1:10
for j=1:b^2
s = sum(A(j,:,i))+1;
P(i,j) = s/((N1+1)+2); % 计算样品X属于wi类条件下,X的第j个分量为1的概率估计值
end
end
p = ones(10,1);
for h=1:10
for k=1:(N1+1)
for i=1:10
for j=1:b^2
if A(j,k,h)==1
p(i)=p(i)*P(i,j);
else p(i)=p(i)*(1-P(i,j));
end
end
end
p = P1*p; % 利用贝叶斯公式求后验概率,分母都相同故忽略不计
[maxval,index] = max(p); % 后验概率的最大值的类别0-9就是手写数字的所属类别
I(h,k) = index; % 行h表示数字0-9,列k表示样本
p = ones(10,1);
end
end
I = I-1;
disp(I); % 显示数字识别矩阵
for i=1:10
z(i) = sum((I(i,:)~=i-1));
end
disp(sum(z(:))); % 计算识别错误的总个数
disp((N-sum(z(:)))/N); % 计算正确率
利用逻辑回归实现手写数字识别
基本原理
数据:从网上下载了一个mnistData.mat文件,将其分为训练集Train_X,大小为
50000
×
784
50000\times784
50000×784,标签为Train_label,大小为
50000
×
1
50000\times1
50000×1,测试集为
10000
×
784
10000\times784
10000×784,标签为
10000
×
1
10000\times1
10000×1,其中50000为训练集样本个数,
784
=
28
×
28
784=28\times28
784=28×28为手写数字图片的大小。标签值为1-10,其中标签10代表数字0,即数字0被标记为10. 将Train_X表示为:
X
=
[
−
(
x
(
1
)
)
T
−
−
(
x
(
2
)
)
T
−
⋮
−
(
x
(
m
)
)
T
−
]
X=\left[\begin{array}{c} -\left(x^{(1)}\right)^{T}- \\ -\left(x^{(2)}\right)^{T}- \\ \vdots \\ -\left(x^{(m)}\right)^{T}- \end{array}\right]
X=⎣⎢⎢⎢⎢⎡−(x(1))T−−(x(2))T−⋮−(x(m))T−⎦⎥⎥⎥⎥⎤
因为有10个类,需要训练10个独立的logistic回归分类器。为了提高效率,必须确保代码的矢量化良好。logistic回归的代价函数为
J
(
θ
)
=
1
m
∑
i
=
1
m
[
−
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
]
J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]
J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]
其中
h
θ
(
x
(
i
)
)
=
g
(
θ
T
x
(
i
)
)
h_{\theta}\left(x^{(i)}\right)=g(\theta^Tx^{(i)})
hθ(x(i))=g(θTx(i)),g为sigmoid函数。易知
X
θ
=
[
−
(
x
(
1
)
)
T
θ
−
−
(
x
(
2
)
)
T
θ
−
⋮
−
(
x
(
m
)
)
T
θ
−
]
=
[
−
θ
T
(
x
(
1
)
)
−
−
θ
T
(
x
(
2
)
)
−
⋮
−
θ
T
(
x
(
m
)
)
−
]
X \theta=\left[\begin{array}{c} -\left(x^{(1)}\right)^{T} \theta- \\ -\left(x^{(2)}\right)^{T} \theta- \\ \vdots \\ -\left(x^{(m)}\right)^{T} \theta- \end{array}\right]=\left[\begin{array}{c} -\theta^{T}\left(x^{(1)}\right)- \\ -\theta^{T}\left(x^{(2)}\right)- \\ \vdots \\ -\theta^{T}\left(x^{(m)}\right)- \end{array}\right]
Xθ=⎣⎢⎢⎢⎢⎡−(x(1))Tθ−−(x(2))Tθ−⋮−(x(m))Tθ−⎦⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡−θT(x(1))−−θT(x(2))−⋮−θT(x(m))−⎦⎥⎥⎥⎤
其中 θ = [ θ 0 θ 1 ⋯ θ n ] T \theta=\left[ \begin{matrix} \theta_0& \theta_1& \cdots& \theta_n\\ \end{matrix} \right]^T θ=[θ0θ1⋯θn]T为需要求的参数, θ 0 \theta_0 θ0为偏置。
利用梯度下降法求得使代价函数获得最小值所对应的参数
θ
\theta
θ,其中梯度为
∂
J
∂
θ
j
=
1
m
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
)
\frac{\partial J}{\partial \theta_{j}}=\frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)
∂θj∂J=m1i=1∑m((hθ(x(i))−y(i))xj(i))
向量化后得到:
[
∂
J
∂
θ
0
∂
J
∂
θ
1
∂
J
∂
θ
2
⋮
∂
J
∂
θ
n
]
=
1
m
[
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
)
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
1
(
i
)
)
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
2
(
i
)
)
⋮
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
n
(
i
)
)
]
=
1
m
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
(
i
)
)
=
1
m
X
T
(
h
θ
(
x
)
−
y
)
\begin{aligned}\left[\begin{array}{c} \frac{\partial J}{\partial \theta_{0}} \\ \frac{\partial J}{\partial \theta_{1}} \\ \frac{\partial J}{\partial \theta_{2}} \\ \vdots \\ \frac{\partial J}{\partial \theta_{n}} \end{array}\right] &=\frac{1}{m}\left[\begin{array}{c} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}\right) \\ \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{1}^{(i)}\right) \\ \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{2}^{(i)}\right) \\ \vdots \\ \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{n}^{(i)}\right) \end{array}\right]\\ &=\frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x^{(i)}\right) \\ &=\frac{1}{m} X^{T}\left(h_{\theta}(x)-y\right) \end{aligned}
⎣⎢⎢⎢⎢⎢⎢⎡∂θ0∂J∂θ1∂J∂θ2∂J⋮∂θn∂J⎦⎥⎥⎥⎥⎥⎥⎤=m1⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1m((hθ(x(i))−y(i))x0(i))∑i=1m((hθ(x(i))−y(i))x1(i))∑i=1m((hθ(x(i))−y(i))x2(i))⋮∑i=1m((hθ(x(i))−y(i))xn(i))⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=m1i=1∑m((hθ(x(i))−y(i))x(i))=m1XT(hθ(x)−y)
其中 h θ ( x ) − y = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] h_{\theta}(x)-y=\left[ \begin{array}{c} h_{\theta}(x^{(1)})-y^{(1)}\\ h_{\theta}(x^{(2)})-y^{(2)}\\ \vdots\\ h_{\theta}(x^{(m)})-y^{(m)}\\ \end{array} \right] hθ(x)−y=⎣⎢⎢⎢⎡hθ(x(1))−y(1)hθ(x(2))−y(2)⋮hθ(x(m))−y(m)⎦⎥⎥⎥⎤
正则化后的代价函数为:
J
(
θ
)
=
1
m
∑
i
=
1
m
[
−
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}
J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
正则化后的梯度为:
∂
J
(
θ
)
∂
θ
0
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
for
j
=
0
∂
J
(
θ
)
∂
θ
j
=
(
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
)
+
λ
m
θ
j
for
j
≥
1
\begin{array}{ll} \frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} & \text { for } j=0 \\ \frac{\partial J(\theta)}{\partial \theta_{j}}=\left(\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)+\frac{\lambda}{m} \theta_{j} & \text { for } j \geq 1 \end{array}
∂θ0∂J(θ)=m1∑i=1m(hθ(x(i))−y(i))xj(i)∂θj∂J(θ)=(m1∑i=1m(hθ(x(i))−y(i))xj(i))+mλθj for j=0 for j≥1
于是梯度下降算法为:
Repeat until convergence{
θ
0
:
=
θ
0
−
a
1
m
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
)
\theta_{0}:=\theta_{0}-a \frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}\right)
θ0:=θ0−am1∑i=1m((hθ(x(i))−y(i))x0(i))
θ
j
:
=
θ
j
−
a
[
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
+
λ
m
θ
j
]
\theta_{j}:=\theta_{j}-a\left[\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}+\frac{\lambda}{m} \theta_{j}\right]
θj:=θj−a[m1∑i=1m(hθ(x(i))−y(i))xj(i)+mλθj]
for
j
=
1
,
2
,
…
n
j=1,2, \ldots n
j=1,2,…n
}
通过梯度下降迭代训练多个正则化logistic回归分类器来实现one vs all分类,每个分类器对应于数据集中的10个类,代码应该返回矩阵中的所有分类器参数 θ \theta θ,这里使用fmincg函数来迭代计算使得Cost函数取得最小值所对应的参数 θ \theta θ.
训练完one-vs-all分类器之后,现在可以使用它来预测给定(测试集)图像中包含的数字。对于每个输入,使用经过训练的logistic回归分类器计算它属于每个类的“概率”。one-vs-all预测函数将选择对应的logistic回归分类器输出最大概率的类,并返回类标签(1、2、…、10)作为输入示例的预测。
实验结果分析
数字的展示:
其中一个分类器(数字9)的迭代结果:
Iteration 1 | Cost: 3.489434e-01
Iteration 2 | Cost: 2.070781e-01
Iteration 3 | Cost: 1.228187e-01
Iteration 4 | Cost: 1.081875e-01
Iteration 5 | Cost: 8.960722e-02
Iteration 6 | Cost: 7.287214e-02
Iteration 7 | Cost: 5.928005e-02
Iteration 8 | Cost: 4.909163e-02
Iteration 9 | Cost: 4.381959e-02
Iteration 10 | Cost: 4.279940e-02
Iteration 11 | Cost: 3.903445e-02
Iteration 12 | Cost: 3.780443e-02
Iteration 13 | Cost: 3.620455e-02
Iteration 14 | Cost: 3.605800e-02
Iteration 15 | Cost: 3.468461e-02
Iteration 16 | Cost: 3.443997e-02
Iteration 17 | Cost: 3.348350e-02
Iteration 18 | Cost: 3.327099e-02
Iteration 19 | Cost: 3.300861e-02
Iteration 20 | Cost: 3.296667e-02
Iteration 21 | Cost: 3.282463e-02
Iteration 22 | Cost: 3.277023e-02
Iteration 23 | Cost: 3.226063e-02
Iteration 24 | Cost: 3.206913e-02
Iteration 25 | Cost: 3.132217e-02
Iteration 26 | Cost: 3.129234e-02
Iteration 27 | Cost: 3.107378e-02
Iteration 28 | Cost: 3.100813e-02
Iteration 29 | Cost: 3.077306e-02
Iteration 30 | Cost: 3.073778e-02
Iteration 31 | Cost: 3.058509e-02
Iteration 32 | Cost: 3.054714e-02
Iteration 33 | Cost: 3.042337e-02
Iteration 34 | Cost: 3.039406e-02
Iteration 35 | Cost: 3.010051e-02
Iteration 36 | Cost: 3.002950e-02
Iteration 37 | Cost: 2.972486e-02
Iteration 38 | Cost: 2.969565e-02
Iteration 39 | Cost: 2.940304e-02
Iteration 40 | Cost: 2.931633e-02
Iteration 41 | Cost: 2.896889e-02
Iteration 42 | Cost: 2.891379e-02
Iteration 43 | Cost: 2.866397e-02
Iteration 44 | Cost: 2.854081e-02
Iteration 45 | Cost: 2.853175e-02
Iteration 46 | Cost: 2.834961e-02
Iteration 47 | Cost: 2.818457e-02
Iteration 48 | Cost: 2.815316e-02
Iteration 49 | Cost: 2.802013e-02
Iteration 50 | Cost: 2.799512e-02
准确率为:90.92%
可以发现准确率比Bayes方法提升大约8.65%,但运行时间比Bayes慢约20s. 改进之处:继续提高准确率,缩短运行时间。
Logistic | Bayes |
---|---|
92.10s | 72.20s |
程序代码
主文件
%% 初始化
clear ; close all; clc
tic;
%% 设置参数,划分训练集和测试集
num_labels = 10; % 10个labels, 从1到10,label 10代表数字0
load('mnistData.mat')
X = Data(:,1:784);
y = Data(:,785);
X = double(X);
y = double(y);
y(y==0)=10;
% 分为50000组训练集和10000组测试集
for i = 1:10
Train_X((i-1)*5000+(1:5000),:) = X((i-1)*6000+(1:5000),:);
Train_label((i-1)*5000+(1:5000),:) = y((i-1)*6000+(1:5000),:);
Test_X((i-1)*1000+(1:1000),:) = X((i-1)*6000+(5001:6000),:);
Test_label((i-1)*1000+(1:1000),:) = y((i-1)*6000+(5001:6000),:);
end
%% =========== Part 1: 可视化部分手写数字 =============
X = Train_X;
y = Train_label;
m = size(X, 1); % 总样本数
% 展示100个图片
rand_indices = randperm(m);
sel = X(rand_indices(1:100), :);
displayData(sel); % 调用函数displayData
%% ============ Part 2: 向量化逻辑回归并迭代求参 ============
lambda = 0.1;
[all_theta] = oneVsAll(X, y, num_labels, lambda); % 调用oneVsAll函数
%% ================ Part 3: 对测试集进行测试 ================
pred = predictOneVsAll(all_theta, Test_X);
fprintf('\nTest Set Accuracy: %f\n', mean(double(pred == Test_label)) * 100); % 计算准确率
toc;
展示图片
function [h, display_array] = displayData(X, example_width)
% Set example_width automatically if not passed in
if ~exist('example_width', 'var') || isempty(example_width)
example_width = round(sqrt(size(X, 2))); % 对浮点数进行四舍五入
end
% 灰度图像
colormap(gray); % 输出一个灰色系的曲面图,用gray矩阵映射当前图形的色图
% 计算行和列
[m,n] = size(X);
example_height = (n / example_width);
% Compute number of items to display
display_rows = floor(sqrt(m)); % 向下舍入为最接近的整数
display_cols = ceil(m / display_rows); % 向上舍入为最接近的整数
% Between images padding
pad = 1;
% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...
pad + display_cols * (example_width + pad)); % 1代表边缘或边界线
% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
for i = 1:display_cols
if curr_ex > m
break;
end
% Get the max value of the patch
max_val = max(abs(X(curr_ex, :))); % 归一化
X1 = reshape(X(curr_ex, :), example_height, example_width);
display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
X1' / max_val;
curr_ex = curr_ex + 1;
end
if curr_ex > m
break;
end
end
% 展示图片
h = imagesc(display_array, [-1 1]); % 颜色范围[-1,1]
% 不显示坐标轴
axis image off
drawnow;
end
逻辑回归代价函数
function [J, grad] = lrCostFunction(theta, X, y, lambda)
% 初始化一些有用的参数
m = length(y); % 训练样本个数
J = 0;
grad = zeros(size(theta));
theta1 = [0;theta(2:end)]; % 先把theta(1)拿掉,不参与正则化
J1 = -y.*log(sigmoid(X * theta))-(1-y).*(log(1-sigmoid(X * theta))); % logistic回归的Cost函数
J = 1/m * sum(J1(:)) + lambda/(2*m) * theta1' * theta; % 正则化的logistic回归Cost函数
grad = (X'*(sigmoid(X * theta)-y))/m + lambda/m * theta1; % 正则化梯度
grad = grad(:);
end
调参
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
m = size(X, 1);
n = size(X, 2);
all_theta = zeros(num_labels, n + 1);
% 增加一列全是1
X = [ones(m, 1) X]; % 使得维度一致,避免乘积出错
initial_theta = zeros(n + 1, 1); % 初始化theta
options = optimset('GradObj', 'on', 'MaxIter', 50);
% 在fmincg函数中,使用自定义的代价函数,即lrCostFunction,并定义了最大迭代次数为50
% 使用Matlab的fmincg库函数来求解使得代价函数取最小值的模型参数theta
for c = 1:num_labels
all_theta(c,:) = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
end
end
预测(测试集)
function p = predictOneVsAll(all_theta, X)
m = size(X, 1);
num_labels = size(all_theta, 1);
p = zeros(size(X, 1), 1); % 初始化
X = [ones(m, 1) X]; % 增加一列全是1
H = sigmoid(X * all_theta');
[~,p] = max(H,[],2); % 返回每行最大值的索引位置,也就是预测的数字
end
利用神经网络实现手写数字识别
有时间再补充
利用SVM实现手写数字识别
有时间再补充