knn算法python代码改进_优化代码模拟sklearn-kNN算法

最新推荐文章于 2024-05-09 19:20:18 发布

weixin_39839968

最新推荐文章于 2024-05-09 19:20:18 发布

阅读量401

点赞数

本文链接：https://blog.csdn.net/weixin_39839968/article/details/112936759

版权

关键词由CSDN通过智能技术生成

我写了一个脚本执行kNN分类使用自制函数。我已经将它的性能与一个类似的脚本进行了比较，但是使用的是sklearn包。在

结果：

自制~20秒

学习~2秒

所以现在我想知道性能差异主要是因为sklearn是在较低的级别上执行的(据我所知是用C语言)还是因为我的脚本效率不高。在

如果你们中的一些人得到了提供编写高效Python脚本和程序的参考资料，我都知道

以下是数据文件：DataFile

文件名，操作系统环境[回家]，操作系统环境两个脚本中的['R\u USER']都必须根据您的目录结构将其设为特定于用户的

我的代码使用国产kNN分类法#Start Timer

import time

tic = time.time()

# Begin Script

import os

os.environ['R_HOME'] = r'C:\Users\MyUser\Documents\R\R-3.4.1' #setting temporary PATH variables : R_HOME

#a permanent solution could be achieved but more complicated

os.environ['R_USER'] = r'C:\Users\MyUser\AppData\Local\Programs\Python\Python36\Lib\site-packages\rpy2'

#same story

import rpy2.robjects as robjects

import numpy as np

import matplotlib.pyplot as plt

## Read R data from ESLII book

dir = os.path.dirname(__file__)

filename = os.path.join(dir, '../ESL.mixture.rda')

robjects.r['load'](filename) #load rda file in R workspace

rObject = robjects.r['ESL.mixture'] #read variable in R workspace and save it into python workspace

#Extract Blue and Orange classes data

classes = np.array(rObject[0]) #note that information about rObject are known by outputing the object into the console

#numpy is able to convert R data natively

BLUE = classes[0:100,:]

BLUE = np.concatenate((BLUE,np.zeros(np.size(BLUE,axis=0))[:,None]),axis=1)

#the [:,None] is necessary to make the 1D array 2D.

#Indeed concatenate requires identical dimensions

#other functions exist such as np.columns_stack but they take more time to execute than basic concatenate

ORANGE = classes[100:200]

ORANGE = np.concatenate((ORANGE,np.ones(np.size(ORANGE,axis=0))[:,None]),axis=1)

trainingSet = np.concatenate((BLUE,ORANGE),axis=0)

##create meshgrid

minBound = -3

maxBound = 4.5

xmesh = np.linspace(minBound, maxBound, 100)

ymesh = np.linspace(minBound, maxBound, 100)

xv, yv = np.meshgrid(xmesh, ymesh)

gridSet =np.stack((xv.ravel(),yv.ravel())).T

def predict(trainingSet, queryPoint, k):

# create list for distances and targets

distances = []

# compute euclidean distance

for i in range (np.size(trainingSet,0)):

distances.append(np.sqrt(np.sum(np.square(trainingSet[i,:-1]-queryPoint))))

#find k nearest neighbors to the query point and compute its outcome

distances=np.array(distances)

indices = np.argsort(distances) #provides indices, sorted from short to long distances

kindices = indices[0:k]

kNN = trainingSet[kindices,:]

queryOutput = np.average(kNN[:,2])

return queryOutput

k = 1

gridSet = np.concatenate((gridSet,np.zeros(np.size(gridSet,axis=0))[:,None]),axis=1)

i=0

for point in gridSet[:,:-1]:

gridSet[i,2] = predict(trainingSet, point, k)

i+=1

#k = 1

#test = predict(trainingSet, np.array([4.0, 1.2]), k)

col = np.where(gridSet[:,2]<0.5,'b','r').flatten() #flatten is necessary. 2D arrays are only accepted with RBA colors

plt.scatter(gridSet[:,0],gridSet[:,1],c=col,s=0.2)

col = np.where(trainingSet[:,2]<0.5,'b','r').flatten() #flatten is necessary. 2D arrays are only accepted with RBA colors

plt.scatter(trainingSet[:,0],trainingSet[:,1],c=col,s=1.0)

plt.contour(xv,yv,gridSet[:,2].reshape(xv.shape),0.5)

plt.savefig('kNN_homeMade.png', dpi=600)

plt.show()

#Stop timer

toc = time.time()

print(toc-tic, 'sec Elapsed')

我的代码使用sklearn kNN

^{pr2}$

weixin_39839968

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
knn算法python代码改进_优化代码模拟sklearn-kNN算法

我写了一个脚本执行kNN分类使用自制函数。我已经将它的性能与一个类似的脚本进行了比较，但是使用的是sklearn包。在结果：自制~20秒学习~2秒所以现在我想知道性能差异主要是因为sklearn是在较低的级别上执行的(据我所知是用C语言)还是因为我的脚本效率不高。在如果你们中的一些人得到了提供编写高效Python脚本和程序的参考资料，我都知道以下是数据文件：DataFile文件名，操作系统环境[回...
复制链接

扫一扫