图像处理之KNN+CNN数字识别

最新推荐文章于 2024-06-19 13:36:14 发布

阿优乐扬

最新推荐文章于 2024-06-19 13:36:14 发布

阅读量2.4k

点赞数 1

分类专栏：图像处理文章标签：人工智能 tensorflow 深度学习图像识别数字识别

本文链接：https://blog.csdn.net/ayouleyang/article/details/104149169

版权

图像处理专栏收录该内容

11 篇文章 4 订阅

订阅专栏

文章目录

OpenCV-TensorFlow 入门人工智能图像处理

OpenCV-TensorFlow 入门人工智能图像处理

样本地址： http://yann.lecun.com/exdb/mnist/

文件	内容
train-images-idx3-ubyte.gz	训练集图片 - 55000张训练图片，5000张验证图片
train-labels-idx1-ubyte.gz	训练集图片对应的数字标签
t10k-images-idx3-ubyte.gz	测试集图片 - 10000张图片
t10k-labels-idx1-ubyte.gz	测试集图片对应的数字标签

下载的4个文件放在一个文件夹，命名为MNIST_data ，并同代码放在一个文件夹。

1、KNN数字识别

1.1、load Data

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
import random
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

1.2、knn test train distance 5*500=2500距离

# 属性设置
trainNum = 55000
testNum = 10000
trainSize =500
testSize = 5
k = 4
# data 分解  1、范围0~trainNum； 2、trainSize； 3、replace=False
trainIndex = np.random.choice(trainNum, trainSize, replace=False)
testIndex = np.random.choice(testNum, testSize, replace=False)
trainData = mnist.train.images[trainIndex] #训练图片；trainData= (500, 784) 500是图片个数，图片宽28*高28=784
trainlabel = mnist.train.labels[trainIndex] #训练标签；trainlabel= (500, 10)
testData = mnist.test.images[testIndex]# testData= (5, 784)
testLabel = mnist.test.labels[testIndex]# testLabel= (5, 10)
print ("trainData=",trainData.shape)
print ("trainlabel=",trainlabel.shape)
print ("testData=",testData.shape)
print ("testLabel=",testLabel.shape)

1.3、knn k个最近的5张测试图片和500张训练图片做差，找到4张最近的图片

# tf input
trainDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
trainLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)
testDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
testLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)

# knn distance 原5*785————>现5*1*784
# 5测试数据， 500训练数据， 每个维度都是784（3D） 2500*784
f1 = tf.expand_dims(testDataInput, 1) #夸大一个维度
f2 = tf.subtract(trainDataInput, f1) #784 sum(784)
f3 = tf.reduce_sum(tf.abs(f2), reduction_indices=2) #完成数据累加 784
f4 = tf.negative(f3) # 取反
f5, f6 = tf.nn.top_k(f4, k=4) # 选取f4 最大的四个值
f7 = tf.gather(trainLabelInput, f6) # 根据下标所引训练图片的标签
f8 = tf.reduce_sum(f7, reduction_indices=1)
f9 = tf.argmax(f8, dimension=1) # tf.argmax 选取在某一个最大的值

1.4、k个最近的图片 ————> parse centent label

with tf.Session() as sess:
    p1 = sess.run(f1, feed_dict={testDataInput:testData[0:5]})
    print ("p1 = ",p1.shape) # p1 =  (5, 1, 784)
    p2 = sess.run(f2, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p2 = ",p2.shape) # p2 =  (5, 500, 784)
    p3 = sess.run(f3, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p3 = ",p3.shape) # p3 =  (5, 500)
    print ("p3[0, 0] = ", p3[0, 0]) # p3[0, 0] =  116.76471
    p4 = sess.run(f4, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p4 = ", p4.shape) # p4 =  (5, 500)
    print ("p4[0, 0] = ", p4[0, 0]) # p4[0, 0] =  -116.76471
    p5, p6 = sess.run((f5, f6), feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p5 = ",p5.shape) # p5 =  (5, 4)  每一张测试图片（5张） 分别对应4张最近训练图片
    print ("p6 = ",p6.shape) # p6 =  (5, 4)
    print ("p5[0, 0] = ", p5[0, 0]) # 这是一个随机数
    print ("p6[0, 0] = ", p6[0, 0]) # p6 index
    p7 = sess.run(f7, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p7 = ", p7.shape) # p7 =  (5, 4, 10)
    p8 = sess.run(f8, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p8 = ", p8)
    print ("p8.shape = ", p8.shape) # p8.shape =  (5, 10)
    p9 = sess.run(f9, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p9 = ", p9) # p9 =  [3 3 2 8 2]， 是p8中最大值的下标
    print ("p9.shape = ", p9.shape) # p9.shape =  (5,)
    p10 = np.argmax(testLabel[0:5], axis=1) # 测试标签的索引内容
    print ("p10 = ", p10) # 通过比较p9和p10的结果得到统计的概率

1.5、统计监测数据的概率

#计算统计的识别正确率
j = 0
for i in range(0, 5):
    if p10[i] == p9[i]:
        j = j + 1
print ("本次识别正确率 =", j*100/5)

1.6、源码汇总

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
import random
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# 属性设置
trainNum = 55000
testNum = 10000
trainSize =500
testSize = 5
k = 4
# data 分解  1、范围0~trainNum； 2、trainSize； 3、replace=False
trainIndex = np.random.choice(trainNum, trainSize, replace=False)
testIndex = np.random.choice(testNum, testSize, replace=False)
trainData = mnist.train.images[trainIndex] #训练图片；trainData= (500, 784) 500是图片个数，图片宽28*高28=784
trainlabel = mnist.train.labels[trainIndex] #训练标签；trainlabel= (500, 10)
testData = mnist.test.images[testIndex]# testData= (5, 784)
testLabel = mnist.test.labels[testIndex]# testLabel= (5, 10)
print ("trainData=",trainData.shape)
print ("trainlabel=",trainlabel.shape)
print ("testData=",testData.shape)
print ("testLabel=",testLabel.shape)

# tf input
trainDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
trainLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)
testDataInput = tf.placeholder(shape=[None, 784], dtype=tf.float32)# shape为维度
testLabelInput = tf.placeholder(shape=[None, 10], dtype=tf.float32)

# knn distance 原5*785————>现5*1*784
# 5测试数据， 500训练数据， 每个维度都是784（3D） 2500*784
f1 = tf.expand_dims(testDataInput, 1) #夸大一个维度
f2 = tf.subtract(trainDataInput, f1) #784 sum(784)
f3 = tf.reduce_sum(tf.abs(f2), reduction_indices=2) #完成数据累加 784
f4 = tf.negative(f3) # 取反
f5, f6 = tf.nn.top_k(f4, k=4) # 选取f4 最大的四个值
f7 = tf.gather(trainLabelInput, f6) # 根据下标所引训练图片的标签
f8 = tf.reduce_sum(f7, reduction_indices=1)
f9 = tf.argmax(f8, dimension=1) # tf.argmax 选取在某一个最大的值

with tf.Session() as sess:
    p1 = sess.run(f1, feed_dict={testDataInput:testData[0:5]})
    print ("p1 = ",p1.shape) # p1 =  (5, 1, 784)
    p2 = sess.run(f2, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p2 = ",p2.shape) # p2 =  (5, 500, 784)
    p3 = sess.run(f3, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p3 = ",p3.shape) # p3 =  (5, 500)
    print ("p3[0, 0] = ", p3[0, 0]) # p3[0, 0] =  116.76471
    p4 = sess.run(f4, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p4 = ", p4.shape) # p4 =  (5, 500)
    print ("p4[0, 0] = ", p4[0, 0]) # p4[0, 0] =  -116.76471
    p5, p6 = sess.run((f5, f6), feed_dict={trainDataInput:trainData, testDataInput:testData[0:5]})
    print ("p5 = ",p5.shape) # p5 =  (5, 4)  每一张测试图片（5张） 分别对应4张最近训练图片
    print ("p6 = ",p6.shape) # p6 =  (5, 4)
    print ("p5[0, 0] = ", p5[0, 0]) # 这是一个随机数
    print ("p6[0, 0] = ", p6[0, 0]) # p6 index
    p7 = sess.run(f7, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p7 = ", p7.shape) # p7 =  (5, 4, 10)
    p8 = sess.run(f8, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p8 = ", p8)
    print ("p8.shape = ", p8.shape) # p8.shape =  (5, 10)
    p9 = sess.run(f9, feed_dict={trainDataInput:trainData, testDataInput:testData[0:5], trainLabelInput:trainlabel})
    print ("p9 = ", p9) # p9 =  [3 3 2 8 2]， 是p8中最大值的下标
    print ("p9.shape = ", p9.shape) # p9.shape =  (5,)
    p10 = np.argmax(testLabel[0:5], axis=1) # 测试标签的索引内容
    print ("p10 = ", p10) # 通过比较p9和p10的结果得到统计的概率
#计算统计的识别正确率
j = 0
for i in range(0, 5):
    if p10[i] == p9[i]:
        j = j + 1
print ("本次识别正确率 =", j*100/5)

运行结果：

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
trainData= (500, 784)
trainlabel= (500, 10)
testData= (5, 784)
testLabel= (5, 10)
p1 =  (5, 1, 784)
p2 =  (5, 500, 784)
p3 =  (5, 500)
p3[0, 0] =  194.5373
p4 =  (5, 500)
p4[0, 0] =  -194.5373
p5 =  (5, 4)
p6 =  (5, 4)
p5[0, 0] =  -64.77253
p6[0, 0] =  484
p7 =  (5, 4, 10)
p8 =  [[4. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 4. 0. 0. 0. 0. 0.]
 [0. 0. 0. 4. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 3. 0.]
 [0. 0. 0. 0. 0. 0. 0. 4. 0. 0.]]
p8.shape =  (5, 10)
p9 =  [0 4 3 8 7]
p9.shape =  (5,)
p10 =  [0 4 3 8 7]
本次识别正确率 = 100.0

2、CNN实现手写数字识别

2.1、导入安装包

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

2.2、加载数据

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

2.3、定义tf.placeholder()

imageInput = tf.placeholder(tf.float32, [None, 784]) # 28*28=784
LabelInput = tf.placeholder(tf.float32, [None, 10])

2.4、转换数据类型

# [None, 784] ———> M*28*28*1  2D ——— 4D  28*28 wh 1 channel
imageInputReshape = tf.reshape(imageInput, [-1, 28, 28, 1])

2.5、卷积运算

# 卷积 w0 : 卷积内核 5*5 in:1  out:32
w0 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b0 = tf.Variable(tf.constant(0.1, shape=[32]))

2.6、激励函数+卷积运算

# imageInputReshape:M*28*28*1  w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape, w0, strides=[1, 1, 1, 1], padding='SAME') +b0 )
# M*28*28*32
# pool 采样 ———> 数据量减少很多 M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')

2.7、激励函数+乘加运算

# layer2 out:激励函数+乘加运算    softmax:激励函数+乘加运算
w1 = tf.Variable(tf.truncated_normal([7*7*32, 1024], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_reshape = tf.reshape(layer1_pool, [-1, 7*7*32]) # M*7*7*32 ———> N*N1   3D ——> 2D
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape, w1) + b1)
# 7.1、softMax
w2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b2 = tf.Variable(tf.constant(0.1, shape=[10])) # N*1024 1024*10 = N*10
pred = tf.nn.softmax(tf.matmul(h1, w2) + b2)
loss0 = LabelInput * tf.log(pred)
loss1 = 0
# 7.2
for m in range(0, 100):
    for n in range(0, 10):
        loss1 = loss1 - loss0[m,n]
loss = loss1 / 100

2.8、训练集

train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

2.9、运行

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        images,labels = mnist.train.next_batch(500)
        sess.run(train,feed_dict={imageInput:images,LabelInput:labels})
        ## 检测预测值 
        pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,LabelInput:labels})
        acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
        acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
        acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,LabelInput:mnist.test.labels})
        print(acc_result)

2.10、源码汇总

# CNN ： 卷积
# 1、import
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()#兼容1.0版本
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
# 2、load data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# 3、input
imageInput = tf.placeholder(tf.float32, [None, 784]) # 28*28=784
LabelInput = tf.placeholder(tf.float32, [None, 10])
# 4、data reshape
# [None, 784] ———> M*28*28*1  2D ——— 4D  28*28 wh 1 channel
imageInputReshape = tf.reshape(imageInput, [-1, 28, 28, 1])
# 5、卷积 w0 : 卷积内核 5*5 in:1  out:32
w0 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b0 = tf.Variable(tf.constant(0.1, shape=[32]))

# 6、layer1:激励函数+卷积运算
# imageInputReshape:M*28*28*1  w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape, w0, strides=[1, 1, 1, 1], padding='SAME') +b0 )
# M*28*28*32
# pool 采样 ———> 数据量减少很多 M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')

# 7、layer2 out:激励函数+乘加运算    softmax:激励函数+乘加运算
w1 = tf.Variable(tf.truncated_normal([7*7*32, 1024], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_reshape = tf.reshape(layer1_pool, [-1, 7*7*32]) # M*7*7*32 ———> N*N1   3D ——> 2D
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape, w1) + b1)
# 7.1、softMax
w2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b2 = tf.Variable(tf.constant(0.1, shape=[10])) # N*1024 1024*10 = N*10
pred = tf.nn.softmax(tf.matmul(h1, w2) + b2)
loss0 = LabelInput * tf.log(pred)
loss1 = 0
# 7.2
for m in range(0, 100):
    for n in range(0, 10):
        loss1 = loss1 - loss0[m,n]
loss = loss1 / 100

# 8、train
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# 9 run
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        images,labels = mnist.train.next_batch(500)
        sess.run(train,feed_dict={imageInput:images,LabelInput:labels})
        ## 检测预测值 
        pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,LabelInput:labels})
        acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
        acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
        acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,LabelInput:mnist.test.labels})
        print(acc_result)

运行结果：

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
0.1581
0.1714
0.1771
0.1951
0.2065
0.2363
0.2596
0.267
0.3245
0.3308
0.3531
0.4143
0.44
0.4393
0.3842
0.4771
0.4509
0.4632
0.499
0.462
0.4652
0.5596
0.575
0.5983
0.5877
0.608
0.6139
......

阿优乐扬

关注

1
点赞
踩
15

收藏

觉得还不错? 一键收藏
2
评论
图像处理之KNN+CNN数字识别

OpenCV-TensorFlow 入门人工智能图像处理1、KNN数字识别1.1、load Data1.2、knn test train distance 5*500=2500距离1.3、knn k个最近的5张测试图片和500张训练图片做差，找到4张最近的图片1.4、k个最近的图片 ————> parse centent label1.5、统计监测数据的概率1.6、源码汇总2、CNN实现手写数字识别2.1、导入安装包2.2、加载数据2.3、定义tf.placeholder()
复制链接

扫一扫

专栏目录