图像处理之KNN+CNN数字识别

OpenCV-TensorFlow 入门人工智能图像处理

样本地址: http://yann.lecun.com/exdb/mnist/

文件内容
train-images-idx3-ubyte.gz训练集图片 - 55000张 训练图片,5000张 验证图片
train-labels-idx1-ubyte.gz训练集图片对应的数字标签
t10k-images-idx3-ubyte.gz测试集图片 - 10000张 图片
t10k-labels-idx1-ubyte.gz测试集图片对应的数字标签
  • 下载的4个文件放在一个文件夹,命名为MNIST_data ,并同代码放在一个文件夹。

1、KNN数字识别

1.1、load Data

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
import random
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

1.2、knn test train distance 5*500=2500距离

# 属性设置
trainNum = 55000
testNum = 10000
trainSize =500
testSize = 5
k = 4
# data 分解  1、范围0~trainNum; 2、trainSize; 3、replace=False
trainIndex = np.random.choice(trainNum, trainSize, replace=False)
testIndex = np.random.choice(testNum, testSize, replace=False)
trainData = mnist.train.images[trainIndex] #训练图片;trainData= (500, 784) 500是图片个数,图片宽28*高28=784
trainlabel = mnist.train.labels[trainIndex] #训练标签;trainlabel= (500, 10)
testData = mnist.test.images[testIndex]# testData= (5, 784)
testLabel = mnist.test.labels[testIndex]# testLabel= (5, 10)
print ("trainData=",trainData.shape)
print ("trainlabel=",trainlabel.shape)
print ("testData=",testData.shape)
print ("testLabel=",testLabel.shape)

1.3、knn k个最近的5张测试图片和500张训练图片做差,找到4张最近的图片

# tf input
trainDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
trainLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)
testDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
testLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)

# knn distance 原5*785————>现5*1*784
# 5测试数据, 500训练数据, 每个维度都是784(3D) 2500*784
f1 = tf.expand_dims(testDataInput, 1) #夸大一个维度
f2 = tf.subtract(trainDataInput, f1) #784 sum(784)
f3 = tf.reduce_sum(tf.abs(f2), reduction_indices=2) #完成数据累加 784
f4 = tf.negative(f3) # 取反
f5, f6 = tf.nn.top_k(f4, k=4) # 选取f4 最大的四个值
f7 = tf.gather(trainLabelInput, f6) # 根据下标所引训练图片的标签
f8 = tf.reduce_sum(f7, reduction_indices=1)
f9 = tf.argmax(f8, dimension=1) # tf.argmax 选取在某一个最大的值

1.4、k个最近的图片 ————> parse centent label

with tf.Session() as sess:
    p1 = sess.run(f1, feed_dict={testDataInput:testData[0:5]})
    print ("p1 = ",p1.shape) # p1 =  (5, 1, 784)
    p2 = sess.run(f2, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p2 = ",p2.shape) # p2 =  (5, 500, 784)
    p3 = sess.run(f3, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p3 = ",p3.shape) # p3 =  (5, 500)
    print ("p3[0, 0] = ", p3[0, 0]) # p3[0, 0] =  116.76471
    p4 = sess.run(f4, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p4 = ", p4.shape) # p4 =  (5, 500)
    print ("p4[0, 0] = ", p4[0, 0]) # p4[0, 0] =  -116.76471
    p5, p6 = sess.run((f5, f6), feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p5 = ",p5.shape) # p5 =  (5, 4)  每一张测试图片(5张) 分别对应4张最近训练图片
    print ("p6 = ",p6.shape) # p6 =  (5, 4)
    print ("p5[0, 0] = ", p5[0, 0]) # 这是一个随机数
    print ("p6[0, 0] = ", p6[0, 0]) # p6 index
    p7 = sess.run(f7, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p7 = ", p7.shape) # p7 =  (5, 4, 10)
    p8 = sess.run(f8, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p8 = ", p8)
    print ("p8.shape = ", p8.shape) # p8.shape =  (5, 10)
    p9 = sess.run(f9, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p9 = ", p9) # p9 =  [3 3 2 8 2], 是p8中最大值的下标
    print ("p9.shape = ", p9.shape) # p9.shape =  (5,)
    p10 = np.argmax(testLabel[0:5], axis=1) # 测试标签的索引内容
    print ("p10 = ", p10) # 通过比较p9和p10的结果得到统计的概率

1.5、统计监测数据的概率

#计算统计的识别正确率
j = 0
for i in range(0, 5):
    if p10[i] == p9[i]:
        j = j + 1
print ("本次识别正确率 =", j*100/5)  

1.6、源码汇总

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
import random
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# 属性设置
trainNum = 55000
testNum = 10000
trainSize =500
testSize = 5
k = 4
# data 分解  1、范围0~trainNum; 2、trainSize; 3、replace=False
trainIndex = np.random.choice(trainNum, trainSize, replace=False)
testIndex = np.random.choice(testNum, testSize, replace=False)
trainData = mnist.train.images[trainIndex] #训练图片;trainData= (500, 784) 500是图片个数,图片宽28*高28=784
trainlabel = mnist.train.labels[trainIndex] #训练标签;trainlabel= (500, 10)
testData = mnist.test.images[testIndex]# testData= (5, 784)
testLabel = mnist.test.labels[testIndex]# testLabel= (5, 10)
print ("trainData=",trainData.shape)
print ("trainlabel=",trainlabel.shape)
print ("testData=",testData.shape)
print ("testLabel=",testLabel.shape)

# tf input
trainDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
trainLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)
testDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
testLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)

# knn distance 原5*785————>现5*1*784
# 5测试数据, 500训练数据, 每个维度都是784(3D) 2500*784
f1 = tf.expand_dims(testDataInput, 1) #夸大一个维度
f2 = tf.subtract(trainDataInput, f1) #784 sum(784)
f3 = tf.reduce_sum(tf.abs(f2), reduction_indices=2) #完成数据累加 784
f4 = tf.negative(f3) # 取反
f5, f6 = tf.nn.top_k(f4, k=4) # 选取f4 最大的四个值
f7 = tf.gather(trainLabelInput, f6) # 根据下标所引训练图片的标签
f8 = tf.reduce_sum(f7, reduction_indices=1)
f9 = tf.argmax(f8, dimension=1) # tf.argmax 选取在某一个最大的值

with tf.Session() as sess:
    p1 = sess.run(f1, feed_dict={testDataInput:testData[0:5]})
    print ("p1 = ",p1.shape) # p1 =  (5, 1, 784)
    p2 = sess.run(f2, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p2 = ",p2.shape) # p2 =  (5, 500, 784)
    p3 = sess.run(f3, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p3 = ",p3.shape) # p3 =  (5, 500)
    print ("p3[0, 0] = ", p3[0, 0]) # p3[0, 0] =  116.76471
    p4 = sess.run(f4, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p4 = ", p4.shape) # p4 =  (5, 500)
    print ("p4[0, 0] = ", p4[0, 0]) # p4[0, 0] =  -116.76471
    p5, p6 = sess.run((f5, f6), feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p5 = ",p5.shape) # p5 =  (5, 4)  每一张测试图片(5张) 分别对应4张最近训练图片
    print ("p6 = ",p6.shape) # p6 =  (5, 4)
    print ("p5[0, 0] = ", p5[0, 0]) # 这是一个随机数
    print ("p6[0, 0] = ", p6[0, 0]) # p6 index
    p7 = sess.run(f7, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p7 = ", p7.shape) # p7 =  (5, 4, 10)
    p8 = sess.run(f8, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p8 = ", p8)
    print ("p8.shape = ", p8.shape) # p8.shape =  (5, 10)
    p9 = sess.run(f9, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p9 = ", p9) # p9 =  [3 3 2 8 2], 是p8中最大值的下标
    print ("p9.shape = ", p9.shape) # p9.shape =  (5,)
    p10 = np.argmax(testLabel[0:5], axis=1) # 测试标签的索引内容
    print ("p10 = ", p10) # 通过比较p9和p10的结果得到统计的概率
#计算统计的识别正确率
j = 0
for i in range(0, 5):
    if p10[i] == p9[i]:
        j = j + 1
print ("本次识别正确率 =", j*100/5)    

运行结果:

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
trainData= (500, 784)
trainlabel= (500, 10)
testData= (5, 784)
testLabel= (5, 10)
p1 =  (5, 1, 784)
p2 =  (5, 500, 784)
p3 =  (5, 500)
p3[0, 0] =  194.5373
p4 =  (5, 500)
p4[0, 0] =  -194.5373
p5 =  (5, 4)
p6 =  (5, 4)
p5[0, 0] =  -64.77253
p6[0, 0] =  484
p7 =  (5, 4, 10)
p8 =  [[4. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 4. 0. 0. 0. 0. 0.]
 [0. 0. 0. 4. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 3. 0.]
 [0. 0. 0. 0. 0. 0. 0. 4. 0. 0.]]
p8.shape =  (5, 10)
p9 =  [0 4 3 8 7]
p9.shape =  (5,)
p10 =  [0 4 3 8 7]
本次识别正确率 = 100.0

2、CNN实现手写数字识别

2.1、导入安装包

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

2.2、加载数据

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

2.3、定义tf.placeholder()

imageInput = tf.placeholder(tf.float32, [None, 784]) # 28*28=784
LabelInput = tf.placeholder(tf.float32, [None, 10])

2.4、转换数据类型

# [None, 784] ———> M*28*28*1  2D ——— 4D  28*28 wh 1 channel
imageInputReshape = tf.reshape(imageInput, [-1, 28, 28, 1])

2.5、卷积运算

# 卷积 w0 : 卷积内核 5*5 in:1  out:32
w0 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b0 = tf.Variable(tf.constant(0.1, shape=[32]))

2.6、激励函数+卷积运算

# imageInputReshape:M*28*28*1  w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape, w0, strides=[1, 1, 1, 1], padding='SAME') +b0 )
# M*28*28*32
# pool 采样 ———> 数据量减少很多 M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')

2.7、激励函数+乘加运算

# layer2 out:激励函数+乘加运算    softmax:激励函数+乘加运算
w1 = tf.Variable(tf.truncated_normal([7*7*32, 1024], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_reshape = tf.reshape(layer1_pool, [-1, 7*7*32]) # M*7*7*32 ———> N*N1   3D ——> 2D
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape, w1) + b1)
# 7.1、softMax
w2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b2 = tf.Variable(tf.constant(0.1, shape=[10])) # N*1024 1024*10 = N*10
pred = tf.nn.softmax(tf.matmul(h1, w2) + b2)
loss0 = LabelInput * tf.log(pred)
loss1 = 0
# 7.2
for m in range(0, 100):
    for n in range(0, 10):
        loss1 = loss1 - loss0[m,n]
loss = loss1 / 100

2.8、训练集

train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

2.9、运行

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        images,labels = mnist.train.next_batch(500)
        sess.run(train,feed_dict={imageInput:images,LabelInput:labels})
        ## 检测预测值 
        pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,LabelInput:labels})
        acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
        acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
        acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,LabelInput:mnist.test.labels})
        print(acc_result)

2.10、源码汇总

# CNN : 卷积
# 1、import
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
# 2、load data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# 3、input
imageInput = tf.placeholder(tf.float32, [None, 784]) # 28*28=784
LabelInput = tf.placeholder(tf.float32, [None, 10])
# 4、data reshape
# [None, 784] ———> M*28*28*1  2D ——— 4D  28*28 wh 1 channel
imageInputReshape = tf.reshape(imageInput, [-1, 28, 28, 1])
# 5、卷积 w0 : 卷积内核 5*5 in:1  out:32
w0 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b0 = tf.Variable(tf.constant(0.1, shape=[32]))

# 6、layer1:激励函数+卷积运算
# imageInputReshape:M*28*28*1  w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape, w0, strides=[1, 1, 1, 1], padding='SAME') +b0 )
# M*28*28*32
# pool 采样 ———> 数据量减少很多 M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')

# 7、layer2 out:激励函数+乘加运算    softmax:激励函数+乘加运算
w1 = tf.Variable(tf.truncated_normal([7*7*32, 1024], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_reshape = tf.reshape(layer1_pool, [-1, 7*7*32]) # M*7*7*32 ———> N*N1   3D ——> 2D
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape, w1) + b1)
# 7.1、softMax
w2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b2 = tf.Variable(tf.constant(0.1, shape=[10])) # N*1024 1024*10 = N*10
pred = tf.nn.softmax(tf.matmul(h1, w2) + b2)
loss0 = LabelInput * tf.log(pred)
loss1 = 0
# 7.2
for m in range(0, 100):
    for n in range(0, 10):
        loss1 = loss1 - loss0[m,n]
loss = loss1 / 100

# 8、train
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# 9 run
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        images,labels = mnist.train.next_batch(500)
        sess.run(train,feed_dict={imageInput:images,LabelInput:labels})
        ## 检测预测值 
        pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,LabelInput:labels})
        acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
        acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
        acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,LabelInput:mnist.test.labels})
        print(acc_result)

运行结果:

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
0.1581
0.1714
0.1771
0.1951
0.2065
0.2363
0.2596
0.267
0.3245
0.3308
0.3531
0.4143
0.44
0.4393
0.3842
0.4771
0.4509
0.4632
0.499
0.462
0.4652
0.5596
0.575
0.5983
0.5877
0.608
0.6139
......
  • 1
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
基于 KNN(k-最近邻)和 CNN(卷积神经网络)的手写数字识别算法可以达到99.7%的高识别率。这个算法结合了两种不同的机器学习技术,以提高准确性和鲁棒性。 KNN算法是一种简单有效的监督学习算法,它根据样本点的特征和它们最近邻的类别进行分类。在手写数字识别中,KNN算法使用训练集中的数字样本作为参考,并通过计算待识别数字与训练集中各个数字之间的距离来确定其类别。KNN算法的优点在于简单易懂、易于实现,但对于大规模数据集和高维度特征可能存在计算复杂度较高的问题。 而CNN算法是一种更为复杂的神经网络模型,特别适用于图像处理任务。通过卷积层、池化层和全连接层的组合,CNN能够有效地提取图像中的特征,并学习到不同数字的表示。在手写数字识别中,CNN可以学习到数字的形状、边缘信息和纹理等特征,从而实现高精度的识别。 基于KNNCNN的手写数字识别算法相当于将KNN的特征选择与CNN的分类能力相结合。首先,利用CNN训练模型,获得高层次的特征表示。然后,使用KNN算法对这些特征进行分类,通过选择最近邻的方式来判断待识别数字的类别。这种算法的优势在于能够充分利用CNN模型提取的特征,并结合KNN的优势进行分类,从而实现较高的识别率。 总之,基于KNNCNN的手写数字识别算法通过综合利用KNNCNN的特点,能够获得高达99.7%的准确率,为手写数字识别问题提供了一种高效可行的解决方案。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值