《机器学习实战》-哈林顿 (Peter Harrington)-实战源代码:
第二章,kNN 算法错误,2.2.1 Prepare: parsing data from a text file
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
由于配套的文件datingTestSet.txt第四列为文本字串,故代码中标红的int类型,在python3中调试时会报错,应当改为str。但如果修改成str后,又面临最后一列是数字时不能识别的问题,所以应该基于判断listFromLine[-1])是数字还是字母,然后有选择性地进行类型转换,代码修改后如下所示:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
if listFromLine[-1].isdigit():
classLabelVector.append(int(listFromLine[-1]))
elif listFromLine[-1].isalpha():
classLabelVector.append(str(listFromLine[-1]))
index += 1
return returnMat,classLabelVector