python和机器学习 第四章 KNN算法(一)

KNN邻近算法基础

训练集散点图绘制

得到训练集,绘制散点图,并将要预测的点在散点图上标识出来

In [170]: import numpy as np
In [171]: import matplotlib.pyplot as plt
In [172]: raw_data_x = [[3.393533211,2.331273381],
     ...:               [3.110074235,1.781534656],
     ...:               [1.328465656,3.368356546],
     ...:               [3.582265476,4.674545656],
     ...:               [2.285754767,2.866456456],
     ...:               [7.423496235,4.657345464],
     ...:               [5.745378782,3.533989345],
     ...:               [9.175767676,2.511111111],
     ...:               [7.796765765,3.067653222],
     ...:               [7.939457677,0.796865211]
     ...: ]
In [173]: raw_data_y = [0,0,0,0,0,1,1,1,1,1]
In [174]: X_train = np.array(raw_data_x)
In [175]: Y_train = np.array(raw_data_y)
#要进行预测的数据点
In [179]: z = numpy.array([8.093607318,3.365731514])
#绘制散点图
In [180]: plt.scatter(X_train[Y_train==0,0],X_train[Y_train==0,1],color='g')
     ...: plt.scatter(X_train[Y_train==1,0],X_train[Y_train==1,1],color='r')
     ...: plt.scatter(z[0],z[1],color='blue')

训练集散点图

KNN过程

算法大致步骤
1、计算待测点z到训练集中每个点的距离
2、取出距离最小的K个点
3、选择这K个样本中最多的类作为待测点z的类

In [194]: distances = [sqrt(np.sum((x_train - z) ** 2)) for x_train in X_train]
In [198]: nearest = np.argsort(distances)
In [197]: k=6
In [200]: topK_y = [Y_train[i] for i in nearest[:k]]
In [201]: topK_y
Out[201]: [1, 1, 1, 1, 1, 0]
In [203]: from collections import Counter
In [204]: votes = Counter(topK_y)
Out[204]: Counter({1: 5, 0: 1})
#得到票数最多的元素
In [207]: votes.most_common(1)[0][0]
Out[207]: 1

封装为函数
在这里插入图片描述

scikit learn中的KNN

In [209]: from sklearn.neighbors import KNeighborsClassifier
In [210]: KNN_classifier = KNeighborsClassifier(n_neighbors=6)
In [211]: KNN_classifier.fit(X_train,Y_train)
In [213]: x_predict = z.reshape(1,-1)
In [214]: x_predict
Out[214]: array([[8.09360732, 3.36573151]])
In [215]: KNN_classifier.predict(x_predict)
Out[215]: array([1])

重新编写KNN
knn.py

import numpy as np
from math import sqrt
from collections import Counter
class KNNClassifier:
 def __init__(self,k):
  """初始化KNN分类器"""
  assert k>=1,"k must be valid"
  self.k = k
  self._X_train = None  
  """变量前加下划线表示是私有的成员变量"""
  self._y_train = None
  
def fit(self,X_train,y_train):
  """根据训练数据集X_train和y_train训练KNN分类器"""
  assert X_train.shape[0] == y_train.shape[0],\
   "the size of X_train must be equal to the size of y_train"
  assert self.k <= X_train.shape[0],\
   "the size of X_train must be at least k."
  self._X_train = X_train
  self._y_train = y_train
  return self
  
 def predict(self,X_predict):
  """给定待预测数据集X_predict,返回表示X_predict的结果向量"""
  assert self._X_train is not None and self._y_train id not None,\
   "must fit before predict!" 
  """列数即特征的个数要一致"""
  assert X_predict.shape[1]==self._X_train.shape[1],\
   "the feature number of X_predict must be equal to X_train"
  y_predict = [self._predict(x) for x in X_predict]
  return np.array(y_predict)
  
 def _predict(self,x):
  """给定单个待测数据x,返回x的预测结果集"""
  assert x.shape[0] == self._X_train.shape[1],\
   "the feature number of x must be equal to X_train"   
  distances = [sqrt(np.sum((x_train - x)**2))
     for x_train in self._X_train]
  nearest = np.argsort(distances)
  topK_y = [self._y_train[i] for i in nearest[:self.k]]
  votes = Counter(topK_y)
  return votes.most_common(1)[0][0]   

运行上述代码

In [236]: %run fnn.py
In [237]: KNNClassifier(k=6)
In [242]: y_predict = knn_clf.predict(x_predict)
In [239]: knn_clf.fit(X_train,Y_train)
In [243]: y_predict
Out[243]: array([1])
In [244]: y_predict[0]
Out[244]: 1
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值