ML自学之KNN算法的实现
防挨打声明:笔者初学python和ML,有异议尽管提出,大家共同进步
问题边界分析:
input:
训练集鸢尾花花种类(0,1,2),花数据([‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, 'petal width (cm)’])
试验集鸢尾花花种类以及花数据
引申出的数据,包括试验集列表大小,训练集列表大小。
k也就是k近邻算法中的k值,根据k来确定花的种类。
process:
首先向类中初始化训练集的数据,之后计算每一个试验集的点距离每一个训练集中点的欧式距离(使用numpy中的方法计算),将各点用计算好的距离进行排序。排序之后选择距实验点最近的k个点,哪种花占的比例越多,就将试验点的花种类定位哪种花(引申问题:如果有两种花种类个数相同,应该如何选)。
output:
输出试验点花的knn预测种类。
输出试验点花的真是种类。
#Writen by mifubo -- 2019.4.11
import numpy as np
from sklearn.datasets import load_iris
#set data
iris = load_iris()
data = iris.data
#iris data,target(0,1,2 on behalf of the class of flowers)
target = iris.target
labels = iris.feature_names
#set train_group and knn_text_group
tg_data = data[:140]
tg_target = target[:140]
kg_data = data[140:]
kg_target = target[140:]
class knn():
def __init__(self, database,k=10):
self._database = database
self._data = database[0]
self._target = database[1]
self._dshape = database[0].shape
self._tshape = database[1].shape
self._k = k
#calculation of distance
def Euclidean_distance(self,x,y):
x = x.reshape(-1)
y = x.reshape(-1)
return np.sum((x-y)**2)
#return & Calculate the space distance
def predict(self,kg_data):
#set a list to save predict target
k_predict = np.zeros(kg_data.shape[0])
for i,item in enumerate(kg_data):
dist = np.zeros([self._dshape[0],2])
count = np.zeros(self._dshape[0])
#dist save the distance between textpoint and trainponin
#count save the times of each point appeared
for j,x in enumerate(self._data):
dist[j] = [self.Euclidean_distance(x,item),self._target[j]]
# print(dist[j])
dist = dist[dist[:,0].argsort()]
#sort ponit by distance
for u in range(self._k):
count[int(dist[u][1])]+=1
k_predict[i] = np.argmax(count)
return k_predict.astype(int)
knn = knn((tg_data,tg_target),k=3)
y = knn.predict(kg_data)
print(y)
print(kg_target)
测试结果