python机械_Python机器学习入门

8eb973f338d1

Anne Hathaway

在Windows上安装Python

Python官网:https://www.python.org/

我的电脑是64位的,安装3.x版本选择Windows x86-64 executable installer,由于2.x和3.x版本不兼容,考虑到2.x版本的代码要修改后才能运行,所以我选择的是2.x版本:Windows x86-64 MSI installer

8eb973f338d1

注意选上pip和Add python.exe to Path,然后一路点“Next”即可完成安装。

默认会安装到C:\Python27目录下,然后打开命令提示符窗口,敲入python后,看到上面的画面,就说明Python安装成功!

8eb973f338d1

如果出现:‘python’不是内部或外部命令,也不是可运行的程序或批处理文件

这是因为Windows会根据一个Path的环境变量设定的路径去查找python.exe,如果没找到,就会报错。如果在安装时漏掉了勾选Add python.exe to Path,那就要手动把python.exe所在的路径C:\Python27添加到Path中

8eb973f338d1

Python把环境变量配置在path所有变量的最前面 导致在加载windows系统的变量的前面所以不起作用,需要重启 ,但是你只需要把变量移到最后面就不需要重启。

Python 3 安装jupyter notebook

python3 -m pip install --upgrade pip

python3 -m pip install jupyter

Python 2 安装jupyter notebook

python -m pip install --upgrade pip

python -m pip install jupyter

启动 Jupyter Notebook

jupyter notebook

安装numpy

因为要有很多的矩阵计算,所以要安装numpy包

下载地址:点击打开链接

根据自己安装的python版本选择安装包,intel平台的就选择win32:numpy-1.14.3+mkl-cp27-cp27m-win32.whl

将下载的安装包拷贝在Python安装目录下C:\Python27\Scripts

将Scripts这个文件夹的地址拷贝下来,然后“右击计算机-属性-高级系统设置-环境变量-系统变量-path-编辑它”将刚才的路径粘贴进去。

进入DOS,输入pip版本号 install +numpy的路径+文件名

例如我的是pip2.7 install C:\Python27\Scripts\numpy-1.14.3+mkl-cp27-cp27m-win32.whl

安装成功就会提示successfully installed

8eb973f338d1

安装的过程中出现了意想不到的错误:第二个按照提示升级pip即可,但是第一个错误是怎么回事呢?

原来我所安装的python所支持的whl 文件类型是win32,并不是你操作系统是64位的就选amd64的,所以重新下载一个win32的numpy包就好了。

8eb973f338d1

8eb973f338d1

安装Matplotlib

跟安装numpy一样,找到Matplotlib包,下载到Python安装目录下C:\Python27\Scripts,通过cmd安装:pip2.7 install C:\Python27\Scripts\matplotlib-2.2.2-cp27-cp27m-win32.whl

安装 pandas

pip2.7 install C:\Python27\Scripts\pandas-0.23.0-cp27-cp27m-win32.whl

安装 seaborn

pip install seaborn

安装 scipy

pip2.7 install C:\Python27\Scripts\scipy-1.1.0-cp27-cp27m-win32.whl

安装 sklearn

pip2.7 install C:\Python27\Scripts\scikit_learn-0.19.1-cp27-cp27m-win32.whl

欧式距离应用

川菜馆排行榜

------------------------------------------------------

| 红烧肉 | 水煮牛肉 | 夫妻肺片 | 麻婆豆腐|

------------------------------------------------------

灶神 | | | | |

------------------------------------------------------

食神 | | | | |

------------------------------------------------------

赌神 | | | | |

------------------------------------------------------

吃货 | | | | |

------------------------------------------------------

引入数据

import numpy as np

Restr_1 = [[3.5, 3.0, 3.0, 4.0],

[2.0, 2.5, 2.5, 3.5],

[3.0, 3.5, 3.0, 4.5],

[4.0, 3.0, 3.5, 4.0]]

Restr_2 = [[4.5, 4.0, 4.0, 4.5],

[3.0, 3.5, 3.5, 4.5],

[4.0, 3.5, 4.0, 4.0],

[4.5, 4.0, 4.5, 4.5]]

Restr_3 = [[1.5, 2.0, 2.0, 2.5],

[1.0, 1.5, 1.5, 1.5],

[2.0, 2.5, 2.0, 2.0],

[1.5, 2.0, 2.5, 2.5]]

欧氏距离公式

def euclidean_score(param1, param2):

subtracted_diff = np.subtract(param1, param2)

squared_diff = np.square( subtracted_diff)

eu_dist = np.sqrt(np.sum(squared_diff))

return eu_dist , 1 / (1 + eu_dist)

R12, r12= euclidean_score(Restr_1,Restr_2)

R13, r13= euclidean_score(Restr_1,Restr_3)

R23, r23= euclidean_score(Restr_2,Restr_3)

R12=3.4641016151377544

R13=5.916079783099616

R23=8.717797887081348

KNN

from numpy import *

import operator

import time

import matplotlib.pyplot as plt

def kNN(inX, dataSet, labels, k):

dataSetSize = dataSet.shape[0]

diffMat = tile(inX, (dataSetSize,1)) - dataSet

sqDiffMat = diffMat**2

sqDistances = sqDiffMat.sum(axis=1)

distances = sqDistances**0.5

sortedDistIndicies = distances.argsort()

"""

print(distances)

print(diffMat)

print(sqDiffMat)

print(sqDistances)

print('index')

print(sortedDistIndicies)

"""

classCount={}

for i in range(k):

voteIlabel = labels[sortedDistIndicies[i]]

classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1

sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)

return sortedClassCount[0][0]

# kNN Example

group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])

labels = ['A','A','B','B']

将数据可视化

fig = plt.figure()

ax = fig.add_subplot(111)

ax.scatter(group[:2,0],group[:2,1], s=70, color='b')

ax.scatter(group[2:4,0],group[2:4,1], s=70, color='r')

plt.xlabel('X')

plt.ylabel('Y')

plt.show()

8eb973f338d1

kNN([0.3,0.2],group,labels,3)

#out:'B' 说明[0.3,0.2]这个点属于B类

请根据前例,对下表中的电影数据采用kNN算法进行分类:

8eb973f338d1

group = array([[3.0,104.0],[2.0,100.0],[1,81],[101,10.0],[99,5],[98,2.0]])

labels = ['Romance','Romance','Romance','Action','Action','Action']

kNN([18,90],group,labels,3)

#out:'Romance'

对文件中的数据进行分析,归类

8eb973f338d1

from numpy import *

import matplotlib.pyplot as plt

def file2matrix(filename):

fr = open(filename)

numberOfLines = len(fr.readlines()) #get the number of lines in the file

returnMat = zeros((numberOfLines,3)) #prepare matrix to return

classLabelVector = [] #prepare labels return

fr = open(filename)

index = 0

for line in fr.readlines():

line = line.strip()

listFromLine = line.split('\t')

returnMat[index,:] = listFromLine[0:3]

classLabelVector.append(int(listFromLine[-1]))

index += 1

return returnMat,classLabelVector

datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')

plt.figure(num=None, figsize=(8, 6), dpi=80, facecolor='w', edgecolor='k')

plt.scatter(datingDataMat[:,1], datingDataMat[:,2], 15.0*array(datingLabels), 15.0*array(datingLabels))

plt.xlabel('Percentage of Time Spent Playing Video Games')

plt.ylabel('Liters of Ice Cream Consumed Per Week')

plt.show()

8eb973f338d1

plt.scatter(datingDataMat[:,0], datingDataMat[:,1], 15.0*array(datingLabels), 15.0*array(datingLabels))

plt.xlabel('Frequent Flyer Miles Earned Per Year')

plt.ylabel('Liters of Ice Cream Consumed Per Week')

plt.show()

8eb973f338d1

import numpy as np

import matplotlib.pyplot as plt

from matplotlib.ticker import NullFormatter # useful for `logit` scale

# Fixing random state for reproducibility

np.random.seed(19680801)

# make up some data in the interval ]0, 1[

y = np.random.normal(loc=0.5, scale=0.4, size=1000)

y = y[(y > 0) & (y < 1)]

y.sort()

x = np.arange(len(y))

# plot with various axes scales

plt.figure(1)

# linear

plt.subplot(221)

plt.plot(x, y)

plt.yscale('linear')

plt.title('linear')

plt.grid(True)

# log

plt.subplot(222)

plt.plot(x, y)

plt.yscale('log')

plt.title('log')

plt.grid(True)

# symmetric log

plt.subplot(223)

plt.plot(x, y - y.mean())

plt.yscale('symlog', linthreshy=0.01)

plt.title('symlog')

plt.grid(True)

# logit

plt.subplot(224)

plt.plot(x, y)

plt.yscale('logit')

plt.title('logit')

plt.grid(True)

# Format the minor tick labels of the y-axis into empty strings with

# `NullFormatter`, to avoid cumbering the axis with too many labels.

plt.gca().yaxis.set_minor_formatter(NullFormatter())

# Adjust the subplot layout, because the logit one may take more space

# than usual, due to y-tick labels like "1 - 10^{-3}"

plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25,

wspace=0.35)

plt.show()

8eb973f338d1

Apriori算法应用

根据Apriori算法编写apriori.py

from numpy import *

def loadDataSet():

return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]

def createC1(dataSet):

C1 = []

for transaction in dataSet:

#print(transaction)

for item in transaction:

#print(item)

if not [item] in C1:

#print("C1 before:")

#print(C1)

C1.append([item])

#print("C1 now:")

#print(C1)

C1.sort()

return map(frozenset, C1)#use frozen set so we

#can use it as a key in a dict

def scanD(D, Ck, minSupport):

ssCnt = {}

for tid in D:

for can in Ck:

if can.issubset(tid):

#print("ssCnt before:")

#print(ssCnt)

if not can in ssCnt: ssCnt[can]=1

else: ssCnt[can] += 1

#print("ssCnt now:")

#print(ssCnt)

numItems = float(len(list(D)))

print("numItems:")

print(numItems)

retList = []

supportData = {}

for key in ssCnt:

print(key)

support = ssCnt[key]/numItems

if support >= minSupport:

retList.insert(0,key)

supportData[key] = support

print(support)

return retList, supportData

def aprioriGen(Lk, k): #creates Ck

retList = []

lenLk = len(Lk)

for i in range(lenLk):

for j in range(i+1, lenLk):

L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2]

L1.sort(); L2.sort()

if L1==L2: #if first k-2 elements are equal

retList.append(Lk[i] | Lk[j]) #set union

return retList

def apriori(dataSet, minSupport = 0.5):

C1 = createC1(dataSet)

D = list(map(set, dataSet))

L1, supportData = scanD(D, C1, minSupport)

L = [L1]

k = 2

while (len(L[k-2]) > 0):

Ck = aprioriGen(L[k-2], k)

Lk, supK = scanD(D, Ck, minSupport)#scan DB to get Lk

supportData.update(supK)

L.append(Lk)

k += 1

return L, supportData

def generateRules(L, supportData, minConf=0.7): #supportData is a dict coming from scanD

bigRuleList = []

for i in range(1, len(L)):#only get the sets with two or more items

for freqSet in L[i]:

H1 = [frozenset([item]) for item in freqSet]

if (i > 1):

rulesFromConseq(freqSet, H1, supportData, bigRuleList, minConf)

else:

calcConf(freqSet, H1, supportData, bigRuleList, minConf)

return bigRuleList

def calcConf(freqSet, H, supportData, brl, minConf=0.7):

prunedH = [] #create new list to return

for conseq in H:

conf = supportData[freqSet]/supportData[freqSet-conseq] #calc confidence

if conf >= minConf:

print(freqSet-conseq,'-->',conseq,'conf:',conf)

brl.append((freqSet-conseq, conseq, conf))

prunedH.append(conseq)

return prunedH

def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7):

m = len(H[0])

if (len(freqSet) > (m + 1)): #try further merging

Hmp1 = aprioriGen(H, m+1)#create Hm+1 new candidates

Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)

if (len(Hmp1) > 1): #need at least two sets to merge

rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)

def pntRules(ruleList, itemMeaning):

for ruleTup in ruleList:

for item in ruleTup[0]:

print(itemMeaning[item])

print(" -------->")

for item in ruleTup[1]:

print(itemMeaning[item])

print("confidence: %f" % ruleTup[2])

print(" ") #print a blank line

引入数据

import apriori

dataSet = [["cakes", "beer", "bread"],

["cakes", "beer", "bread", "donuts"],

["beer", "bread", "pizza"],

["cakes", "bread", "donuts", "pizza"],

["donuts", "pizza"]]

C1 = apriori.createC1(dataSet)

list(C1)

C2 = [frozenset({'cakes', 'beer'}),

frozenset({'cakes', 'beer', 'bread'}),

frozenset({'cakes', 'beer', 'bread', 'donuts'})]

C3 =[frozenset({'beer', 'bread'}),

frozenset({'cakes', 'beer', 'bread'}),

frozenset({'cakes', 'beer', 'bread', 'donuts'}),

frozenset({'beer', 'bread', 'pizza'})]

D = list(map(set, dataSet))

D

计算支持度计数

L2, suppData = apriori.scanD(D, C2, 0)

L2

numItems:

5.0

frozenset({'beer', 'cakes'})

0.4

frozenset({'beer', 'bread', 'cakes'})

0.4

frozenset({'donuts', 'beer', 'bread', 'cakes'})

0.2

决策树应用

根据决策树算法编写trees.py

from math import log

import operator

def calcShannonEnt(dataSet):

numEntries = len(dataSet)

labelCounts = {}

for featVec in dataSet: #the the number of unique elements and their occurance

currentLabel = featVec[-1]

if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0

labelCounts[currentLabel] += 1

shannonEnt = 0.0

for key in labelCounts:

prob = float(labelCounts[key])/numEntries

shannonEnt -= prob * log(prob,2) #log base 2

return shannonEnt

def splitDataSet(dataSet, axis, value):

retDataSet = []

for featVec in dataSet:

if featVec[axis] == value:

reducedFeatVec = featVec[:axis] #chop out axis used for splitting

reducedFeatVec.extend(featVec[axis+1:])

retDataSet.append(reducedFeatVec)

return retDataSet

def chooseBestFeatureToSplit(dataSet):

numFeatures = len(dataSet[0]) - 1 #the last column is used for the labels

baseEntropy = calcShannonEnt(dataSet)

bestInfoGain = 0.0; bestFeature = -1

for i in range(numFeatures): #iterate over all the features

featList = [example[i] for example in dataSet]#create a list of all the examples of this feature

uniqueVals = set(featList) #get a set of unique values

newEntropy = 0.0

for value in uniqueVals:

subDataSet = splitDataSet(dataSet, i, value)

prob = len(subDataSet)/float(len(dataSet))

newEntropy += prob * calcShannonEnt(subDataSet)

infoGain = baseEntropy - newEntropy #calculate the info gain; ie reduction in entropy

print("#", i)

print("infoGain: ", infoGain)

print(" ")

if (infoGain > bestInfoGain): #compare this to the best gain so far

bestInfoGain = infoGain #if better than current best, set to best

bestFeature = i

return bestFeature #returns an integer

def majorityCnt(classList):

classCount={}

for vote in classList:

if vote not in classCount.keys(): classCount[vote] = 0

classCount[vote] += 1

sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)

return sortedClassCount[0][0]

def createTree(dataSet,labels):

classList = [example[-1] for example in dataSet]

if classList.count(classList[0]) == len(classList):

return classList[0]#stop splitting when all of the classes are equal

if len(dataSet[0]) == 1: #stop splitting when there are no more features in dataSet

return majorityCnt(classList)

bestFeat = chooseBestFeatureToSplit(dataSet)

bestFeatLabel = labels[bestFeat]

myTree = {bestFeatLabel:{}}

del(labels[bestFeat])

featValues = [example[bestFeat] for example in dataSet]

uniqueVals = set(featValues)

for value in uniqueVals:

subLabels = labels[:] #copy all of labels, so trees don't mess up existing labels

myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value),subLabels)

return myTree

def classify(inputTree,featLabels,testVec):

firstStr = list(inputTree.keys())[0]

secondDict = inputTree[firstStr]

featIndex = featLabels.index(firstStr)

key = testVec[featIndex]

valueOfFeat = secondDict[key]

if isinstance(valueOfFeat, dict):

classLabel = classify(valueOfFeat, featLabels, testVec)

else: classLabel = valueOfFeat

return classLabel

def storeTree(inputTree,filename):

import pickle

fw = open(filename,'w')

pickle.dump(inputTree,fw)

fw.close()

def grabTree(filename):

import pickle

fr = open(filename)

return pickle.load(fr)

读取文件数据,通过决策树算法进行决策树构建

import trees

fr = open('lenses.txt')

lenses = [inst.strip().split('\t') for inst in fr.readlines()]

# 选择分类

lensesLabels = ['age', 'prescript', 'astigmatic', 'tearRate']

# 构建决策树

lensesTree = trees.createTree(lenses, lensesLabels)

可视化决策树

import matplotlib.pyplot as plt

decisionNode = dict(boxstyle="sawtooth", fc="0.8")

leafNode = dict(boxstyle="round4", fc="0.8")

arrow_args = dict(arrowstyle="<-")

def getNumLeafs(myTree):

numLeafs = 0

firstStr = list(myTree.keys())[0] ###

secondDict = myTree[firstStr]

for key in secondDict.keys():

if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes

numLeafs += getNumLeafs(secondDict[key])

else: numLeafs +=1

return numLeafs

def getTreeDepth(myTree):

maxDepth = 0

firstStr = list(myTree.keys())[0] ###

secondDict = myTree[firstStr]

for key in secondDict.keys():

if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes

thisDepth = 1 + getTreeDepth(secondDict[key])

else: thisDepth = 1

if thisDepth > maxDepth: maxDepth = thisDepth

return maxDepth

def plotNode(nodeTxt, centerPt, parentPt, nodeType):

createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',

xytext=centerPt, textcoords='axes fraction',

va="center", ha="center", bbox=nodeType, arrowprops=arrow_args )

def plotMidText(cntrPt, parentPt, txtString):

xMid = (parentPt[0]-cntrPt[0])/2.0 + cntrPt[0]

yMid = (parentPt[1]-cntrPt[1])/2.0 + cntrPt[1]

createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)

def plotTree(myTree, parentPt, nodeTxt):#if the first key tells you what feat was split on

numLeafs = getNumLeafs(myTree) #this determines the x width of this tree

depth = getTreeDepth(myTree)

firstStr = list(myTree.keys())[0] #the text label for this node should be this

cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)

plotMidText(cntrPt, parentPt, nodeTxt)

plotNode(firstStr, cntrPt, parentPt, decisionNode)

secondDict = myTree[firstStr]

plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD

for key in secondDict.keys():

if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes

plotTree(secondDict[key],cntrPt,str(key)) #recursion

else: #it's a leaf node print the leaf node

plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW

plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)

plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))

plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD

#if you do get a dictonary you know it's a tree, and the first element will be another dict

def createPlot(inTree):

fig = plt.figure(1, facecolor='white')

fig.clf()

axprops = dict(xticks=[], yticks=[])

createPlot.ax1 = plt.subplot(111, frameon=False, **axprops) #no ticks

#createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropses

plotTree.totalW = float(getNumLeafs(inTree))

plotTree.totalD = float(getTreeDepth(inTree))

plotTree.xOff = -0.5/plotTree.totalW; plotTree.yOff = 1.0;

plotTree(inTree, (0.5,1.0), '')

plt.show()

#def createPlot():

# fig = plt.figure(1, facecolor='white')

# fig.clf()

# createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropses

# plotNode('a decision node', (0.5, 0.1), (0.1, 0.5), decisionNode)

# plotNode('a leaf node', (0.8, 0.1), (0.3, 0.8), leafNode)

# plt.show()

def retrieveTree(i):

listOfTrees =[{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}},

{'no surfacing': {0: 'no', 1: {'flippers': {0: {'head': {0: 'no', 1: 'yes'}}, 1: 'no'}}}}

]

return listOfTrees[i]

#createPlot(thisTree)

import treePlotter

treePlotter.createPlot(lensesTree)

8eb973f338d1

tree.png

K-Means与KNN应用

1.利用任意编程语言实现K-Means算法和KNN算法;

使用K-Means算法对以上实验数据中前6部电影进行分簇;

输入表2中最后的“待分类电影”数据,根据前一步的分簇结果对其分簇

8eb973f338d1

某电影分类镜头统计数据

根据K-Means算法编写K-Means.py

from numpy import *

def loadDataSet(fileName): #general function to parse tab -delimited floats

dataMat = [] #assume last column is target value

fr = open(fileName)

for line in fr.readlines():

curLine = line.strip().split('\t')

fltLine = list(map(float,curLine)) #map all elements to float()

dataMat.append(fltLine)

return dataMat

def distEclud(vecA, vecB):

return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)

def kMeans(dataSet, k, distMeas=distEclud, createCent=randCent):

m = shape(dataSet)[0]

clusterAssment = mat(zeros((m,2)))#create mat to assign data points

#to a centroid, also holds SE of each point

centroids = createCent(dataSet, k)

clusterChanged = True

while clusterChanged:

clusterChanged = False

for i in range(m):#for each data point assign it to the closest centroid

minDist = inf; minIndex = -1

for j in range(k):

distJI = distMeas(centroids[j,:],dataSet[i,:])

if distJI < minDist:

minDist = distJI; minIndex = j

if clusterAssment[i,0] != minIndex: clusterChanged = True

clusterAssment[i,:] = minIndex,minDist**2

print(centroids)

for cent in range(k):#recalculate centroids

ptsInClust = dataSet[nonzero(clusterAssment[:,0].A==cent)[0]]#get all the point in this cluster

centroids[cent,:] = mean(ptsInClust, axis=0) #assign centroid to mean

return centroids, clusterAssment

2.装载数据

import kMeans

import numpy as np

dataMat= np.mat([[3,104],[2,100],[1,81],[101,10],[99,5],[98,2],[18,90]])

用K-Means算法对以上实验数据进行分簇

kMeans.distEclud(dataMat[0],dataMat[1])

myCentroids, clustAssing = kMeans.kMeans(dataMat,2)

4.显示分簇

A = np.asarray(dataMat[:,0])

B = np.asarray(dataMat[:,1])

CX = np.asarray(myCentroids[:,0])

CY = np.asarray(myCentroids[:,1])

import matplotlib.pyplot as plt

fig = plt.figure()

ax = fig.add_subplot(111)

ax.scatter(A, B, s=50, color='b')

ax.scatter(CX, CY, s=1000, marker = '+', color='r')

plt.xlabel('X')

plt.ylabel('Y')

plt.show()

8eb973f338d1

5.编写KNN算法对最后的“待分类电影”进行分类

from numpy import *

import operator

import time

import matplotlib.pyplot as plt

def kNN(inX, dataSet, labels, k):

dataSetSize = dataSet.shape[0]

diffMat = tile(inX, (dataSetSize,1)) - dataSet

sqDiffMat = diffMat**2

sqDistances = sqDiffMat.sum(axis=1)

distances = sqDistances**0.5

sortedDistIndicies = distances.argsort()

classCount={}

for i in range(k):

voteIlabel = labels[sortedDistIndicies[i]]

classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1

sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)

return sortedClassCount[0][0]

labels = ['Romance','Romance','Romance','Action','Action','Action']

kNN([18,90],group,labels,3)

分类结果:'Romance'

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值