【Machine Learning in Action】Chap1|Classification|kNN





Comprehension of Listing 2.1


The inputs are:

inX is [0,0];

dataSet is:


labels is:


k is 3


Line 1:

we can use "type" function to notice that 'dataSet' is an numpy.ndarray. And numpy.ndarrap.shape[0] returns the number of rows of matrix, 4 in this example.

so dataSetSize is 4.

Line 2:

Tile is a function from numpy module. We use ‘tile’ to duplicate designated pattern. Here we duplicate inX in order to do matrix substraction.

After the process of this line, we substract inX from every row in dataSet, and the results is stored in diffMat:

 

Line 3:

‘**’ means ‘the power of’.2**4 stand for 2 to the 4th power,for instance.

This line of code calculates the 2nd power of every component in the matrix.

 

 

Line 4:

Numpy.ndarray.sum calculates the sum, as indicated by its name. It can also designate the axis along which to do the adding, ‘axis = 0’ means to do it along the vertical direction, ‘axis = 1’meas to do it along the horizontal direction.

So the results shows the sum of every row in the matrix:

 

 

Line 5:

Similar meaning, stands for the square root of every component of the distance.

 

 

Line 6:

Sorting process of Numpy.ndarray.

 

Which means that the distace at index 2 is the smallest, and that at index 3 is the second smallest, and so on. We can check this is true.

 

Line 7~10:the for loop

classCount is a dictionary.

The for loop do the statistics of labels of member of the k smallest distances.

 

Line 11:

First, let’s check out the prototype of ‘sorted’:

sorted 语法:

sorted(iterable[, cmp[, key[, reverse]]])

参数说明:

· iterable -- 可迭代对象。

· cmp -- 比较的函数,这个具有两个参数,参数的值都是从可迭代对象中取出,此函数必须遵守的规则为,大于则返回1,小于则返回-1,等于则返回0。

· key -- 主要是用来进行比较的元素,只有一个参数,具体的函数的参数就是取自于可迭代对象中,指定可迭代对象中的一个元素来进行排序。

· reverse -- 排序规则,reverse = True 降序 , reverse = False 升序(默认)。

Then, we compare the usage in our example:

 

We can see the the first input parameter is “classCount.iteritems()”, which corresponds to the iterable objects in prototype;

The second input parameter is “key = operator.itemgetter(1)”, which can not correspond to ‘cmp’ in prototype, but it can be the ‘key’ in prototype. So it indicates that we leave out ‘cmp’, and designate ‘key’ directly.

The third input parameter ‘reverse=True’ means the sorting is in descending order.

 

Line 12:

We returned the label that has the largest count.


********************************************************************************

Comprehension of listing 2.2

 


 

Line 1:

Opens the file, whose position is designated by the input parameter.

 

Line 2:

Readlines会返回一个list,所有的内容存在这个list里面

 

Line 3:

Generate a matrix whose dimensions must be in accordance with the input file.

 

Line 4~5:

The reason why we should do ‘open’ again is that we have done ‘readlines’ once, and this operation will cause the file pointer point to the end of the file. Because we need to do ‘readlines’ again, so we should do the ‘open’ operations again.

 

Later part:

Every loop in for will gather one line from the file, and separate the information according to the format of the file already known to us.

We put the first 3 words into ‘returnMat’, and we put the last word(given by the index ‘-1’) into classLabelVector.

Caution!字符串无法用int()转换,所以书上代码会报错,应该将int()删除



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值