《机器学习实战》第二章 2.1 k-近邻

这里写图片描述
《机器学习实战》系列博客主要是实现并理解书中的代码,相当于读书笔记了。毕竟实战不能光看书。动手就能遇到许多奇奇怪怪的问题。博文比较粗糙,需结合书本。博主边查边学,水平有限,有问题的地方评论区请多指教。书中的代码和数据,网上有很多请自行下载。

k-近邻算法采用测量不同特征值之间的距离方法进行分类

2.1.1导入数据

 #coding=utf-8
from numpy import *
import operator
def createDataSet():
    group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])
    #Numpy库中多维数组表示方法array
    labels = ['A', 'A', 'B', 'B']
    return group, labels

命令行运行结果

>>> import kNN
>>> group , labels = kNN.createDataSet()
>>> group
array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])
>>> labels
['A', 'A', 'B', 'B']

2.1.2 实施KNN分类算法

def classify(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0] # 数据集大小
    # 计算距离
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances**0.5
    # 按距离排序
    sortedDistIndicies = distances.argsort()
    # 统计前k个点所属的类别
    classCount = {}
    for i in range(k):
        votaIlabel = labels[sortedDistIndicies[i]]
        classCount[votaIlabel] = classCount.get(votaIlabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    # 返回前k个点中频率最高的类别
    return sortedClassCount[0][0]

命令行输入样例

>>> kNN.classify0([0,0],group,labels,3)
'B'

2.1.3相关函数学习

  • shape 函数 读取矩阵的维度
>>> from numpy import *
>>> shape([1])
(1L,)
>>> shape([[1],[2]])
(2L, 1L)
>>> shape([[1,2]])
(1L, 2L)
>>> shape(3)
()
>>> e = eye(3)
>>> e
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> e.shape
(3L, 3L)
>>> e.shape[0]
3L
  • tile函数 将数组重复n次构成新的数组
>>> from numpy import*
>>> a = [0,1,2]
>>> b = tile(a,2)
>>> b
array([0, 1, 2, 0, 1, 2])
>>> b = tile(a,[1,2])
>>> b
array([[0, 1, 2, 0, 1, 2]])
>>> b = tile(a,[2,1])
>>> b
array([[0, 1, 2],
       [0, 1, 2]])
>>> b = tile(a,[2,2])
>>> b
array([[0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2]])
  • Sum函数 求和
>>> from numpy import*
>>> sum([1,2])
3
>>> sum([[1,2],[2,3],[3,4]])
15
>>> sum([[1,2],[2,3],[3,4]],axis=0)  #按列求和
array([6, 9])
>>> sum([[1,2],[2,3],[3,4]],axis=1)  #按行求和
array([3, 5, 7])
  • Argsort函数 返回数组值从小到大的索引值
>>> from numpy import*
>>> x = [4,2,5]
>>> argsort(x)
array([1, 0, 2], dtype=int64)
>>> x = ([[2,3],[-5,8]])
>>> argsort(x,axis=0)   #按列排序
array([[1, 0],
       [0, 1]], dtype=int64)
>>> argsort(x,axis=1)   #按行排序
array([[0, 1],
       [0, 1]], dtype=int64)
  • Range函数
>>> range(1,5)
[1, 2, 3, 4]
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(1,5,2)
[1, 3]
  • 字典:d = {key1 : value1, key2 : value2 }。键/值对用冒号分割,而各个对用逗号分割,所有这些都包括在花括号中
ab = {       'Swaroop'   : 'swaroopch@byteofpython.info',
             'Larry'     : 'larry@wall.org',             
             'Matsumoto' : 'matz@ruby-lang.org',             
             'Spammer'   : 'spammer@hotmail.com'     }
print "Swaroop's address is %s" % ab['Swaroop']
# Adding a key/value pair 
ab['Guido'] = 'guido@python.org'
# Deleting a key/value pair 
del ab['Spammer']
print 'There are %d contacts in the address-book' % len(ab) 
for name, address in ab.items():    
    print 'Contact %s at %s' % (name, address)
if 'Guido' in ab: 
# OR ab.has_key('Guido')    
    print "Guido's address is %s" % ab['Guido'] 
Swaroop's address is swaroopch@byteofpython.info
There are 4 contacts in the address-book
Contact Swaroop at swaroopch@byteofpython.info
Contact Matsumoto at matz@ruby-lang.org
Contact Larry at larry@wall.org
Contact Guido at guido@python.org
Guido's address is guido@python.org
[Finished in 0.4s]
  • 字典get() 函数,返回指定键的值,如果值不在字典中返回默认值。get()方法语法:dict.get(key, default=None)
 #coding=utf-8
dict = {'Name': 'Zara','Age':27}
print "Value : %s" %  dict.get('Age')
print "Value : %s" %  dict.get('Sex')
print "Value : %s" %  dict.get('Sex','guy')
Value27
ValueNone
Value : guy
[Finished in 0.2s]
  • 字典访问的几种方式 Iteritems 函数
 #coding=utf-8
dict = {'Name': 'Zara','Age':27}
print '****Method one****'
for key in dict:
    print key ,dict[key]
    print key + str(dict[key])
print '****Method two****'
for (k,v) in dict.items():
    print "dict[%s]="%k,v
print '****Method three****'
#items()返回的是列表的对象,而iteritems()返回的是iterator对象
#iteritor是迭代器的意思,一次返回一个数据项,直到没有为止
for k,v in dict.iteritems():
    print "dict[%s]="%k,v
print '****Method four****'
for i in dict.iteritems():
    print i
****Method one****
Age 27
Age27
Name Zara
NameZara
****Method two****
dict[Age]= 27
dict[Name]= Zara
****Method three****
dict[Age]= 27
dict[Name]= Zara
****Method four****
('Age', 27)
('Name', 'Zara')
[Finished in 0.3s]
  • operator.itemgetter函数
>>> import operator
>>> a = [[1,2],[3,4],'hh']
>>> b = operator.itemgetter(2,1)#获取对象的第2个和第1个值
>>> b(a)
('hh', [3, 4])
  • Sorted函数 sorted(iterable,cmp=None, key=None, reverse=False)。
>>> students = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
>>> sorted(students, key=lambda student : student[2])
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>> sorted(students, key=lambda student : student[2],reverse=0)
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>> sorted(students, key=lambda student : student[2],reverse=1)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
>>> sorted(students, key=operator.itemgetter(2)) 
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>> sorted(students, key=operator.itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

key为函数,指定取待排序元素的哪一项进行排序,
sorted(students, key=operator.itemgetter(1,2))即根据第二个域排序,再根据第三个域排序。
reverse参数,是一个bool变量,默认为false升序排列,True降序排列

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值