Machine Learning|Digit recognition Basic on KNN

最新推荐文章于 2024-08-30 20:53:30 发布

grid_vision

最新推荐文章于 2024-08-30 20:53:30 发布

阅读量248

点赞数

文章标签： tensorflow python

本文链接：https://blog.csdn.net/qq_40776179/article/details/105161557

版权

Machine Learning|Digit recognition Basic on KNN

Hello,every one ! Last time we talk about Linear Regression, this time I will introduce K-Neighbor algorithm in Machine Learning.
1.Background
The simplest and most elementary classifier is to record all the corresponding categories of the training data. When the attributes of the test object and the attributes of a training object match exactly, they can be classified.However, how can all the test objects find the training object that exactly matches it? Secondly, there is a problem that a test object matches with multiple training objects at the same time, which results in a training object being divided into multiple classes. Based on these problems, KNN is generated.
KNN is classified by measuring the distance between different eigenvalues.The idea is that if most of the k samples in the feature space that are most similar (i.e. closest to each other in the feature space) belong to a certain category, the sample also belongs to that category, where k is usually an integer not greater than 20.In KNN algorithm, the neighbors selected are all correctly classified objects.This method only determines the category of samples to be subdivided according to the closest one or several samples.
2.Process
Next to KNN algorithm is thought to summarize: it is in the training data and the condition of known labels, enter test data, the characteristics of the test data and training focused on to compare the characteristics of the corresponding, and find the most similar of training focus and former K data, then the test data, the corresponding category is K data appear most frequently in the classification, the algorithm is described as:

calculate the distance between test data and each training data;
sort according to the increasing relation of distance;
select K points with the smallest distance;
determine the occurrence frequency of the category of the first K points;
return the category with the highest frequency in the first K points as the prediction classification of the test data.
3.Apply in Python
I will use the API about tensorflow.tutorials function to finish this program. Due to my version about tensorflow is 2.0.0 so that there are many problem about version compat.
1.First step:
Import library that will be used to structure the net.

import tensorflow as tf
import numpy as np
import random 
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()

2.Second step:
I use the the mnist to load the data and divide the data into image and label. I will use this two parts as the two databases.

trainNum=55000
testNum=10000
trainSize=500
testSize=5
k=5
#data
trainIndex = np.random.choice(trainNum,trainSize,replace=False)
testIndex = np.random.choice(testNum,testSize,replace=False)
trainData = mnist.train.images[trainIndex]# train image
trainLabel = mnist.train.labels[trainIndex]# train label
testData = mnist.test.images[testIndex]#test figure
testLabel = mnist.test.labels[testIndex]#test label
#print(trainIndex)
#print(testIndex)
print('trainData.shape=',trainData.shape)#500*784 1 图片个数 2 784=28*28(图片规模)
print('trainLabel.shape=',trainLabel.shape)#500*10
print('testData.shape=',testData.shape)#5*784
print('testLabel.shape=',testLabel.shape)#5*10
print('testLabel=',testLabel)# 4 :testData [0]  3:testData[1] 6 :testData[2]
#train figure、input label；test image、input label

This step prepares for data, we have two parts label and image the key step is comparing test part’s and train part’s distance to find the nearest test image and train image, if the distance is the nearest we think they are same class.

3.Third step:
Define the variable to save train and test part:

trainDataInput=tf.compat.v1.placeholder(shape=[None,784],dtype=tf.float32)
trainLabelInput = tf.compat.v1.placeholder(shape=[None,10],dtype=tf.float32)
testDataInput = tf.compat.v1.placeholder(shape=[None,784],dtype=tf.float32)
testLabelInput = tf.compat.v1.placeholder(shape=[None,10],dtype=tf.float32)

4.forth step:
Calculate the nearest distance

f1 = tf.expand_dims(testDataInput,1) # Dimension extension
f2 = tf.subtract(trainDataInput,f1)# 784 sum(784)
print(f1.shape)
print(testDataInput.shape)
print(f2.shape)
f3 = tf.compat.v1.reduce_sum(tf.abs(f2),reduction_indices=2)# 
f4 = tf.negative(f3)
f5,f6 = tf.nn.top_k(f4,k=4) # 选取f4 find the image and label
f7 = tf.gather(trainLabelInput,f6)
f8 = tf.compat.v1.reduce_sum(f7,reduction_indices=1)
f9 = tf.compat.v1.argmax(f8,dimension=1)#acquire the label

A label have ten corner marks, every corner marks reflect a number so that it represent 0-9

5.fifth step:
take in sample data

with tf.compat.v1.Session() as sess:
    # f1 <- testData 5张图片
    p1 = sess.run(f1,feed_dict={testDataInput:testData[0:5]})
    p2 = sess.run(f2,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]}
    p3 = sess.run(f3,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p4 = sess.run(f4,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p5,p6 = sess.run((f5,f6),feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p7 = sess.run(f7,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p8 = sess.run(f8,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p9 = sess.run(f9,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p10 = np.argmax(testLabel[0:6],axis=1)

6.sixth step:
output the test label and calculating label if they are same number we said the program can achieve functional.

j = 0
for i in range(0,6):
    if p10[i] == p9[i]:
        j = j+1
print('ac=',j*100/5)

so we can see the outcome:
p9[]= [2 8 4 0 9 7]
p10[]= [2 8 4 0 9 7]
ac= 100.0
conclusion
KNN is a simple algorithm, which key is finding the nearest distance data in train date. if the distance is small we can say the test data and train data are same class. Thanks for your watching! next time I will use CNN to make a program to recognize some pictures.
writing by Neio
Accessory:
https://www.cnblogs.com/ybjourney/p/4702562.html
https://blog.csdn.net/eeeee123456/article/details/79927128

grid_vision

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning|Digit recognition Basic on KNN

Machine Learning|Digit recognition Basic on KNNHello,every one ! Last time we talk about Linear Regression, this time I will introduce K-Neighbor algorithm in Machine Learning.1.BackgroundThe simpl...
复制链接

扫一扫