Machine Learning|Digit recognition Basic on KNN

Machine Learning|Digit recognition Basic on KNN

Hello,every one ! Last time we talk about Linear Regression, this time I will introduce K-Neighbor algorithm in Machine Learning.
1.Background
The simplest and most elementary classifier is to record all the corresponding categories of the training data. When the attributes of the test object and the attributes of a training object match exactly, they can be classified.However, how can all the test objects find the training object that exactly matches it? Secondly, there is a problem that a test object matches with multiple training objects at the same time, which results in a training object being divided into multiple classes. Based on these problems, KNN is generated.
KNN is classified by measuring the distance between different eigenvalues.The idea is that if most of the k samples in the feature space that are most similar (i.e. closest to each other in the feature space) belong to a certain category, the sample also belongs to that category, where k is usually an integer not greater than 20.In KNN algorithm, the neighbors selected are all correctly classified objects.This method only determines the category of samples to be subdivided according to the closest one or several samples.
2.Process
Next to KNN algorithm is thought to summarize: it is in the training data and the condition of known labels, enter test data, the characteristics of the test data and training focused on to compare the characteristics of the corresponding, and find the most similar of training focus and former K data, then the test data, the corresponding category is K data appear most frequently in the classification, the algorithm is described as:

  1. calculate the distance between test data and each training data;
  2. sort according to the increasing relation of distance;
  3. select K points with the smallest distance;
  4. determine the occurrence frequency of the category of the first K points;
  5. return the category with the highest frequency in the first K points as the prediction classification of the test data.
    3.Apply in Python
    I will use the API about tensorflow.tutorials function to finish this program. Due to my version about tensorflow is 2.0.0 so that there are many problem about version compat.
    1.First step:
    Import library that will be used to structure the net.
import tensorflow as tf
import numpy as np
import random 
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()

2.Second step:
I use the the mnist to load the data and divide the data into image and label. I will use this two parts as the two databases.

trainNum=55000
testNum=10000
trainSize=500
testSize=5
k=5
#data
trainIndex = np.random.choice(trainNum,trainSize,replace=False)
testIndex = np.random.choice(testNum,testSize,replace=False)
trainData = mnist.train.images[trainIndex]# train image
trainLabel = mnist.train.labels[trainIndex]# train label
testData = mnist.test.images[testIndex]#test figure
testLabel = mnist.test.labels[testIndex]#test label
#print(trainIndex)
#print(testIndex)
print('trainData.shape=',trainData.shape)#500*784 1 图片个数 2 784=28*28(图片规模)
print('trainLabel.shape=',trainLabel.shape)#500*10
print('testData.shape=',testData.shape)#5*784
print('testLabel.shape=',testLabel.shape)#5*10
print('testLabel=',testLabel)# 4 :testData [0]  3:testData[1] 6 :testData[2]
#train figure、input label;test image、input label

This step prepares for data, we have two parts label and image the key step is comparing test part’s and train part’s distance to find the nearest test image and train image, if the distance is the nearest we think they are same class.

3.Third step:
Define the variable to save train and test part:

trainDataInput=tf.compat.v1.placeholder(shape=[None,784],dtype=tf.float32)
trainLabelInput = tf.compat.v1.placeholder(shape=[None,10],dtype=tf.float32)
testDataInput = tf.compat.v1.placeholder(shape=[None,784],dtype=tf.float32)
testLabelInput = tf.compat.v1.placeholder(shape=[None,10],dtype=tf.float32)

4.forth step:
Calculate the nearest distance

f1 = tf.expand_dims(testDataInput,1) # Dimension extension
f2 = tf.subtract(trainDataInput,f1)# 784 sum(784)
print(f1.shape)
print(testDataInput.shape)
print(f2.shape)
f3 = tf.compat.v1.reduce_sum(tf.abs(f2),reduction_indices=2)# 
f4 = tf.negative(f3)
f5,f6 = tf.nn.top_k(f4,k=4) # 选取f4 find the image and label
f7 = tf.gather(trainLabelInput,f6)
f8 = tf.compat.v1.reduce_sum(f7,reduction_indices=1)
f9 = tf.compat.v1.argmax(f8,dimension=1)#acquire the label

A label have ten corner marks, every corner marks reflect a number so that it represent 0-9

5.fifth step:
take in sample data

with tf.compat.v1.Session() as sess:
    # f1 <- testData 5张图片
    p1 = sess.run(f1,feed_dict={testDataInput:testData[0:5]})
    p2 = sess.run(f2,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]}
    p3 = sess.run(f3,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p4 = sess.run(f4,feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p5,p6 = sess.run((f5,f6),feed_dict{trainDataInput:trainData,testDataInput:testData[0:5]})
    p7 = sess.run(f7,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p8 = sess.run(f8,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p9 = sess.run(f9,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    p10 = np.argmax(testLabel[0:6],axis=1)

6.sixth step:
output the test label and calculating label if they are same number we said the program can achieve functional.

j = 0
for i in range(0,6):
    if p10[i] == p9[i]:
        j = j+1
print('ac=',j*100/5)

so we can see the outcome:
p9[]= [2 8 4 0 9 7]
p10[]= [2 8 4 0 9 7]
ac= 100.0
conclusion
KNN is a simple algorithm, which key is finding the nearest distance data in train date. if the distance is small we can say the test data and train data are same class. Thanks for your watching! next time I will use CNN to make a program to recognize some pictures.
writing by Neio
Accessory:
https://www.cnblogs.com/ybjourney/p/4702562.html
https://blog.csdn.net/eeeee123456/article/details/79927128

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值