【人工智能课程实验】 - 利用贝叶斯分类器实现手写数字 的识别

读入数据与预处理

因为老师给的文件无法直接读取,故从官网导入数据:

官网链接:http://www.cs.nyu.edu/~roweis/data.html 

导入数据之后要对MATLAB文件进行读入:

data=sio.loadmat(trainfile)

对文件type一下:

type(data)
Out[118]: dict

 

将data中的train部分,完全复制到tr中:

for i in range(10) :
    trstr.append('train'+str(i))
for i in range():
    print(trstr[i])

tr = dict.fromkeys(trstr)
for i in range(10):
    tr[trstr[i]]=data[trstr[i]]

将其中一个小图像赋值给tmp,进行如下练习操作:

输出一下第一张“0”的图像:

tmp = tr[trstr[0]][0]
tmp = tmp.reshape(28,28)
im = Image.fromarray(tmp)
plt.imshow(im)
plt.show()

'''
plt.figure("Image") # 图像窗口名称
plt.imshow(tmp)
plt.axis('on') # 关掉坐标轴为 off
plt.title('image') # 图像题目
plt.show()
'''

图像输出如下: 

 运行如下代码:

tmp = tmp.reshape(14,2*28)
im = Image.fromarray(tmp)
plt.imshow(im)
plt.show()

则输出图像如下:(可以思考一下原因,为什么会出现了两个零,而不是被拉宽了的一个零) 

答:模拟一下输出像素点的过程不难发现,相当于左右两侧的图像几乎是一样的像素点,所以输出的图像应该是大致相同的。

 

进行01二值化:

tmp = tr[trstr[0]][0].copy()
tmp = tmp.reshape(28*28)
for i in range(tmp.size) :
    if tmp[i] > 10 :
        tmp[i] = 1
    else :
        tmp[i] = 0

tmp = tmp.reshape(28,28)



im = Image.fromarray(np.uint8(tmp))
plt.imshow(im)

(ps:注意输出图像的时候,传入的参数需要是unsigned int类型的,不然有可能输出的图像是一种颜色的。)

定义一个数组out来进行降维(将28*28的图像降维到7*7)

out=np.zeros((7,7))
tmp=tmp.reshape(28,28)

for i in range(7) :
    for j in range(7) :
        out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4])
print(out.size)
print(out.shape)
for i in range(7):
    for j in range(7):
        if(out[i][j] > 5) :
            out[i][j]=1
        else :
            out[i][j]=0
im = Image.fromarray(np.uint8(out))
plt.imshow(im)
plt.show()

输出的图像如下: 

对单幅图像的操作练习到此结束了。接下来是对原训练集的二值化和降维。

将训练集字典dict进行二值化和降维:

先建立字典: 

tr = dict.fromkeys(trstr)
jwtr = dict.fromkeys(trstr)
for i in range(10):
    #处理测试集
    tr[trstr[i]]=data[trstr[i]].copy()
    jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))

 进行二值化:

for i in range(10):# 枚举所有数字
    print(i)
    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(tr[trstr[i]][j][k] > 0):
                tr[trstr[i]][j][k] = 1

 将字典的值(不是键值key哈,是指值value,这里的值是个目前是二维数组)更改为三维数组,即将784分为28*28

for i in range(10):# 枚举所有数字
    tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28)
#    错误写法!
#    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行                
#        tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)

注意这里不能像注释的这样写!因为他是整个的数组,需要保持形状一致,所以不能只更改第一维取某一个值的时候的第二维。也就是你可以对字典的某一个键,修改对应的value,但是不能对数组的某一维的某一个值,去修改其他维度,要改就整个数组都改。

↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了训练集的所有数字的所有行的28*28的矩阵。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

下面处理得到对应的降维矩阵:

for i in range(10):# 枚举所有数字
    for j in range(tr[trstr[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jwtr[trstr[i]][j][k][kk] > 5):
                    jwtr[trstr[i]][j][k][kk]=1
                else:
                    jwtr[trstr[i]][j][k][kk]=0
        
    jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))

 处理得到先验概率:

P = np.zeros(10,dtype = float)
Nsum = 0
for i in range(10):
    Nsum += tr[trstr[i]].shape[0]

for i in range(10):
    P[i] = tr[trstr[i]].shape[0]/Nsum
PP = np.zeros((49,10),dtype = float)
for i in range(49):
    for j in range(10):
        PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)

↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,PP[i][j]代表第i个特征,组成的数字为j的概率 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

也就是我们训练出了一个二维矩阵,接下来要用这个矩阵来预测验证集了。

首先处理一下验证集:

test_str = [] # 验证集
for i in range(10) :
    test_str.append('test'+str(i))
test_image = dict.fromkeys(test_str)
jw_test_image = dict.fromkeys(test_str)
for i in range(10):
    #处理验证集
    test_image[test_str[i]] = data[test_str[i]].copy()
    jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))

接下来的01二值化和降维的过程和训练集一致:

for i in range(10):# 枚举所有数字
    print(i)
    for j in range(test_image[test_str[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(test_image[test_str[i]][j][k] > 0):
                test_image[test_str[i]][j][k] = 1
for i in range(10):# 枚举所有数字
    test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)

for i in range(10):# 枚举所有数字
    for j in range(test_image[test_str[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jw_test_image[test_str[i]][j][k][kk] > 5):
                    jw_test_image[test_str[i]][j][k][kk]=1
                else:
                    jw_test_image[test_str[i]][j][k][kk]=0
    jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))

 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,我们将原始验证集处理成降维二值化矩阵jw_test_image字典 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

备份一份给操作数组

opr_test_image = dict.fromkeys(test_str)
for i in range(10):
    opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()

  ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,我们接下来的操作矩阵就是opr_test_image字典 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

利用贝叶斯公式求后验概率,并进行预测,概率最大的值对应的数字就是通过PP矩阵预测得到的数字。

注意两点:

①根据贝叶斯公式,发现分母均为同样的正值,由于我们这里只需要关注值之间的大小关系,所以不需要计算分母,只需要比较分子即可。

②因为引用的是numpy包,所以每一个元素都是float类型,所以不能直接相乘,精度不够。但是由于我们这里只需要关注值之间的大小关系,所以可以取个log变成加法运算,保证了大小关系。

Phou = np.ones(10,dtype =float) # 后验概率初始数组
ans = np.zeros(10,dtype =float)
for dig in range(10):
    for col in range(opr_test_image[test_str[dig]].shape[0]):
        tmp = opr_test_image[test_str[dig]][col] #得到49个参数
        Phou = np.zeros(10,dtype =float) # 后验概率初始数组    
        for i in range(49):
            for j in range(10):
                if(tmp[i] != 0): # 若为1
                    Phou[j] = Phou[j]+np.log(PP[i][j])
                else :
                    Phou[j] = Phou[j]+np.log((1-PP[i][j]))
        for j in range(10): # 枚举每一个数字
            Phou[j] = Phou[j] * P[j]
        if(dig == np.argmax(Phou)):
            ans[dig] = ans[dig]+1
#    print(ans[dig])
    ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]
    print("数字%d: "%(dig))
    print(ans[dig])

输出结果:

数字0: 
0.8306122448979592
数字1: 
0.9092511013215859
数字2: 
0.685077519379845
数字3: 
0.6514851485148515
数字4: 
0.6924643584521385
数字5: 
0.7365470852017937
数字6: 
0.7494780793319415
数字7: 
0.7334630350194552
数字8: 
0.6098562628336756
数字9: 
0.7205153617443013

np.mean(ans)
Out[205]: 0.7318750196697547

 

完整代码:

trainfile = "C:\\Users\\...\\mnist_all"

import numpy as np
import pandas as pd
import scipy.io as sio
from matplotlib import pyplot as plt
from PIL import Image

# df = pd.DataFrame(pd.read_csv(train_data,header=1)


'''
data=sio.loadmat(trainfile)
trstr = []
jwtr = []
test_str = [] # 验证集
for i in range(10) :
    trstr.append('train'+str(i))
for i in range(10) :
    test_str.append('test'+str(i))

tr = dict.fromkeys(trstr)
jwtr = dict.fromkeys(trstr)

test_image = dict.fromkeys(test_str)
jw_test_image = dict.fromkeys(test_str)
for i in range(10):
    #处理测试集
    tr[trstr[i]]=data[trstr[i]].copy()
    jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))
    #处理验证集
    test_image[test_str[i]] = data[test_str[i]].copy()
    jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))

'''


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(tr[trstr[i]][j][k] > 0):
                tr[trstr[i]][j][k] = 1


for i in range(10):# 枚举所有数字
    tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28)
#    错误写法!
#    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行                
#        tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)

'''
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了训练集的所有数字的所有行的28*28的矩阵。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
'''

for i in range(10):# 枚举所有数字
    for j in range(tr[trstr[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jwtr[trstr[i]][j][k][kk] > 5):
                    jwtr[trstr[i]][j][k][kk]=1
                else:
                    jwtr[trstr[i]][j][k][kk]=0
        
    jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))


P = np.zeros(10,dtype = float)
Nsum = 0
for i in range(10):
    Nsum += tr[trstr[i]].shape[0]

for i in range(10):
    P[i] = tr[trstr[i]].shape[0]/Nsum
PP = np.zeros((49,10),dtype = float)
for i in range(49):
    for j in range(10):
        PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)

'''
至此,PP[i][j]代表第i个特征,组成的数字为j的概率
'''

# 处理验证集       


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(test_image[test_str[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(test_image[test_str[i]][j][k] > 0):
                test_image[test_str[i]][j][k] = 1
for i in range(10):# 枚举所有数字
    test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)

for i in range(10):# 枚举所有数字
    for j in range(test_image[test_str[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jw_test_image[test_str[i]][j][k][kk] > 5):
                    jw_test_image[test_str[i]][j][k][kk]=1
                else:
                    jw_test_image[test_str[i]][j][k][kk]=0
    jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))

#得到二值化降维矩阵   jw_test_image

'''
接下来,将降维矩阵赋值给操作数组
'''

opr_test_image = dict.fromkeys(test_str)
for i in range(10):
    opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()

#得到操作数组  opr_test_image

Phou = np.ones(10,dtype =float) # 后验概率初始数组
ans = np.zeros(10,dtype =float)
for dig in range(10):
    for col in range(opr_test_image[test_str[dig]].shape[0]):
        tmp = opr_test_image[test_str[dig]][col] #得到49个参数
        Phou = np.zeros(10,dtype =float) # 后验概率初始数组    
        for i in range(49):
            for j in range(10):
                if(tmp[i] != 0): # 若为1
                    Phou[j] = Phou[j]+np.log(PP[i][j])
                else :
                    Phou[j] = Phou[j]+np.log((1-PP[i][j]))
        for j in range(10): # 枚举每一个数字
            Phou[j] = Phou[j] * P[j]
        if(dig == np.argmax(Phou)):
            ans[dig] = ans[dig]+1
#    print(ans[dig])
    ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]
    print("数字%d: "%(dig))
    print(ans[dig])
     
    
'''
下面是对测试集的一个图像的处理样例:

'''

'''
tmp = tr[trstr[0]][0].copy()
tmp = tmp.reshape(28,28)
im = Image.fromarray(tmp)
plt.imshow(im)

plt.figure("Image") # 图像窗口名称
plt.imshow(tmp)
plt.axis('on') # 关掉坐标轴为 off
plt.title('image') # 图像题目
plt.show()
''' 

'''
tmp = tr[trstr[0]][0].copy()
tmp = tmp.reshape(28*28)
for i in range(tmp.size) :
    if tmp[i] > 10 :
        tmp[i] = 1
    else :
        tmp[i] = 0

tmp = tmp.reshape(28,28)



im = Image.fromarray(np.uint8(tmp))
plt.imshow(im)
'''

'''
out=np.zeros((7,7))
tmp=tmp.reshape(28,28)

for i in range(7) :
    for j in range(7) :
        out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4])
print(out.size)
print(out.shape)
for i in range(7):
    for j in range(7):
        if(out[i][j] > 5) :
            out[i][j]=1
        else :
            out[i][j]=0
im = Image.fromarray(np.uint8(out))
plt.imshow(im)
plt.show()
'''

update:(20191201)

发现对于降维到7*7的矩阵,可以做到平均73%的准确率。那么思考降维到14*14的矩阵,保留的特征会更多一些,那么准确率会不会更高一些呢?于是继续写了下面的代码(其实是在上面这个代码上进行了增加,并没有改变原来的东西,也就是说下面这个代码和上面这个代码有很多重复的地方,在注释中也有标注)

实验结果:(为了防止放到下面看不到,这里就放到上面一起写了)

平均预测率为75%左右,也就是说虽然参数变多了,但是预测率的提升并不明显。

#  -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""


trainfile = "D:\\mystudy\\大三上学期作业\\人工智能\\数字识别相关\\mnist_all"
import numpy as np
import pandas as pd
import scipy.io as sio
from matplotlib import pyplot as plt
from PIL import Image
from pylab import *

# df = pd.DataFrame(pd.read_csv(train_data,header=1)


'''
data=sio.loadmat(trainfile)
trstr = []
jwtr = []
test_str = [] # 验证集
for i in range(10) :
    trstr.append('train'+str(i))
for i in range(10) :
    test_str.append('test'+str(i))

tr = dict.fromkeys(trstr)
jwtr = dict.fromkeys(trstr)

test_image = dict.fromkeys(test_str)
jw_test_image = dict.fromkeys(test_str)
for i in range(10):
    #处理测试集
    tr[trstr[i]]=data[trstr[i]].copy()
    jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))
    #处理验证集
    test_image[test_str[i]] = data[test_str[i]].copy()
    jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))

'''


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(tr[trstr[i]][j][k] > 0):
                tr[trstr[i]][j][k] = 1


for i in range(10):# 枚举所有数字
    tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28)
#    错误写法!
#    for j in range(tr[trstr[i]].shape[0]): # 枚举所有行                
#        tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)

'''
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了训练集的所有数字的所有行的28*28的矩阵。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
'''

for i in range(10):# 枚举所有数字
    for j in range(tr[trstr[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jwtr[trstr[i]][j][k][kk] > 5):
                    jwtr[trstr[i]][j][k][kk]=1
                else:
                    jwtr[trstr[i]][j][k][kk]=0
        
    jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))


P = np.zeros(10,dtype = float)
Nsum = 0
for i in range(10):
    Nsum += tr[trstr[i]].shape[0]

for i in range(10):
    P[i] = tr[trstr[i]].shape[0]/Nsum
PP = np.zeros((49,10),dtype = float)
for i in range(49):
    for j in range(10):
        PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)

'''
至此,PP[i][j]代表第i个特征,组成的数字为j的概率
'''

# 处理验证集       


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(test_image[test_str[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(test_image[test_str[i]][j][k] > 0):
                test_image[test_str[i]][j][k] = 1
for i in range(10):# 枚举所有数字
    test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)

for i in range(10):# 枚举所有数字
    for j in range(test_image[test_str[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(7):
            for kk in range(7):
                jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])
                if(jw_test_image[test_str[i]][j][k][kk] > 5):
                    jw_test_image[test_str[i]][j][k][kk]=1
                else:
                    jw_test_image[test_str[i]][j][k][kk]=0
    jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))

#得到二值化降维矩阵   jw_test_image

'''
接下来,将降维矩阵赋值给操作数组
'''

opr_test_image = dict.fromkeys(test_str)
for i in range(10):
    opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()

#得到操作数组  opr_test_image

Phou = np.ones(10,dtype =float) # 后验概率初始数组
ans = np.zeros(10,dtype =float)
for dig in range(10):
    for col in range(opr_test_image[test_str[dig]].shape[0]):
        tmp = opr_test_image[test_str[dig]][col] #得到49个参数
        Phou = np.zeros(10,dtype =float) # 后验概率初始数组    
        for i in range(49):
            for j in range(10):
                if(tmp[i] != 0): # 若为1
                    Phou[j] = Phou[j]+np.log(PP[i][j])
                else :
                    Phou[j] = Phou[j]+np.log((1-PP[i][j]))
        for j in range(10): # 枚举每一个数字
            Phou[j] = Phou[j] * P[j]
        if(dig == np.argmax(Phou)):
            ans[dig] = ans[dig]+1
#    print(ans[dig])
    ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]
    print("数字%d: "%(dig))
    print(ans[dig])
    
print(np.mean(ans))
    
    
    
    
    
    
    
'''
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓降维到14*14的答案↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
'''    
    
    
for i in range(10):
#    test_image[test_str[i]] = data[test_str[i]].copy()
    #处理测试集
    jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],14,14))
    #处理验证集
    jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],14,14))


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(tr[trstr[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(14):
            for kk in range(14):
                jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*2:k*2+2,kk*2:kk*2+2])
                if(jwtr[trstr[i]][j][k][kk] > 2):
                    jwtr[trstr[i]][j][k][kk]=1
                else:
                    jwtr[trstr[i]][j][k][kk]=0
        
    jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],14*14))


P = np.zeros(10,dtype = float)
Nsum = 0
for i in range(10):
    Nsum += tr[trstr[i]].shape[0]

for i in range(10):
    P[i] = tr[trstr[i]].shape[0]/Nsum
PP = np.zeros((14*14,10),dtype = float)
for i in range(14*14):
    for j in range(10):
        PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)

'''
至此,PP[i][j]代表第i个特征,组成的数字为j的概率
'''

# 处理验证集       


for i in range(10):# 枚举所有数字
    print(i)
    for j in range(test_image[test_str[i]].shape[0]): # 枚举所有行
        for k in range(28*28):
            if(test_image[test_str[i]][j][k] > 0):
                test_image[test_str[i]][j][k] = 1
for i in range(10):# 枚举所有数字
    test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)

for i in range(10):# 枚举所有数字
    for j in range(test_image[test_str[i]].shape[0]):# 枚举所有行(此时每一行是28*28的二值化矩阵)
        for k in range(14):
            for kk in range(14):
                jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*2:k*2+2,kk*2:kk*2+2])
                if(jw_test_image[test_str[i]][j][k][kk] > 2):#注意这里也要修改!!!
                    jw_test_image[test_str[i]][j][k][kk]=1
                else:
                    jw_test_image[test_str[i]][j][k][kk]=0
for i in range(10): 
    jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],14*14))

#得到二值化降维矩阵   jw_test_image

'''
接下来,将降维矩阵赋值给操作数组
'''

opr_test_image = dict.fromkeys(test_str)
for i in range(10):
    opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()

#得到操作数组  opr_test_image

Phou = np.zeros(10,dtype =float) # 后验概率初始数组
ans = np.zeros(10,dtype =float)
for dig in range(10):
    for col in range(opr_test_image[test_str[dig]].shape[0]):
        tmp = opr_test_image[test_str[dig]][col] #得到14*14个参数
        Phou = np.zeros(10,dtype =float) # 后验概率初始数组    
        for i in range(14*14):
            for j in range(10):
                if(tmp[i] != 0): # 若为1
                    Phou[j] = Phou[j]+np.log(PP[i][j])
                else :
                    Phou[j] = Phou[j]+np.log((1-PP[i][j]))
        for j in range(10): # 枚举每一个数字
            Phou[j] = Phou[j] * P[j]
        if(dig == np.argmax(Phou)):
            ans[dig] = ans[dig]+1
#    print(ans[dig])
    ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]
    print("数字%d: "%(dig))
    print(ans[dig])

print(np.mean(ans))    
    
    
for dig in range(10):
    im = Image.fromarray(np.uint8(opr_test_image[test_str[dig]][0].reshape(14,14)))
    plt.imshow(im)
    plt.show()
    
    
    
    
'''
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑降维到14*14的答案↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
'''   
  
'''
输出准确率图像:
'''  
def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        plt.text(rect.get_x()+rect.get_width()/2.- 0.2, 1.03*height, '%.2f' % (height))


name_list = ['0', '1', '2', '3', '4', '5', '6', '7','8','9',]
num_list = [0.8306122448979592,0.9092511013215859,0.685077519379845,0.6514851485148515,0.6924643584521385,0.7365470852017937,0.7494780793319415,0.7334630350194552,0.6098562628336756,0.7205153617443013]
autolabel(plt.bar(range(len(num_list)), num_list, color='rgb', tick_label=name_list))
plt.show()




'''
下面是对测试集的一个图像的处理样例:

'''

'''
tmp = tr[trstr[0]][0].copy()
tmp = tmp.reshape(28,28)
im = Image.fromarray(tmp)
plt.imshow(im)

plt.figure("Image") # 图像窗口名称
plt.imshow(tmp)
plt.axis('on') # 关掉坐标轴为 off
plt.title('image') # 图像题目
plt.show()
''' 

'''
tmp = tr[trstr[0]][0].copy()
tmp = tmp.reshape(28*28)
for i in range(tmp.size) :
    if tmp[i] > 10 :
        tmp[i] = 1
    else :
        tmp[i] = 0

tmp = tmp.reshape(28,28)



im = Image.fromarray(np.uint8(tmp))
plt.imshow(im)
'''

'''
out=np.zeros((7,7))
tmp=tmp.reshape(28,28)

for i in range(7) :
    for j in range(7) :
        out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4])
print(out.size)
print(out.shape)
for i in range(7):
    for j in range(7):
        if(out[i][j] > 5) :
            out[i][j]=1
        else :
            out[i][j]=0
im = Image.fromarray(np.uint8(out))
plt.imshow(im)
plt.show()
'''
        
'''
dd = pd.date_range(end = '20191115',periods = 6,)
print(dd)
df1=pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006,1007,1008], 
"gender":['male','female','male','female','male','female','male','female'],
"pay":['Y','N','Y','Y','N','Y','N','Y',],
"m-point":[10,12,20,40,40,40,30,20]})
print(df1)
'''

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值