20191010——分类与回归 KNN进行回归问题

最新推荐文章于 2024-05-15 15:58:50 发布

宫城诗

最新推荐文章于 2024-05-15 15:58:50 发布

阅读量254

点赞数

分类专栏： python机器学习

本文链接：https://blog.csdn.net/qq_36344771/article/details/102482613

版权

python机器学习专栏收录该内容

36 篇文章 1 订阅

订阅专栏

classification 和 regression
分类和回归的问题

回归是求topk和value的平均值
分类时topk中出现最多的类别

np中有一个广播矩阵，可以进行向量的加减

import numpy as np

feature = np.array([
    [-121,47],
    [-121.2,46.5],
    [-122,46.3],
    [-120.9,46.7],
    [-120.1,46.2]
])

label = np.array([
    200,210,250,215,232
])
# predictPoint 是预测点
predictPoint = np.array([-121,46])
# 用numpy中的ndarray可以进行向量的减法
matrixtemp = feature - predictPoint
# numpy的square方法可以使向量每个元素进行平方
matrixtemp2 = np.square(matrixtemp)
print(matrixtemp2)
# 按每行进行相加 计算向量平方和
print(np.sum(matrixtemp2, axis=1))
# 然后开方，计算欧式距离
print(np.sqrt(np.sum(matrixtemp2, axis=1)))

sortindex = np.argsort(np.sqrt(np.sum(matrixtemp2, axis=1)))
labelsort = label[sortindex]
print(labelsort)

k =3
price = np.sum(labelsort[0:k])/k
print("预测的房价是{}万".format(price))

标准数据集，第一行标签
剩下才是数据

需要skiplows = 1
跳过第一行的名称标签

usecols 使用哪几列

import numpy as np

def knn(k,predictPoint,feature,label):
    matrixtemp = feature - predictPoint
    matrixtemp2 = np.square(matrixtemp)
    print(matrixtemp2)
    print(np.sum(matrixtemp2, axis=1))
    print(np.sqrt(np.sum(matrixtemp2, axis=1)))
    sortindex = np.argsort(np.sqrt(np.sum(matrixtemp2, axis=1)))
    labelsort = label[sortindex]
    print(labelsort)
    price = np.sum(labelsort[0:k]) / k
    return price

if __name__ =="__main__":
    feature = np.loadtxt("kc_house_data.csv",delimiter=",",skiprows=1,usecols=(17,18,6))
    label = np.loadtxt("kc_house_data.csv",delimiter=",",skiprows=1,usecols=(2))
    print(feature)
    print(label)
    predictPoint = np.array([47.5112,-122.257,5650])
    price = knn(450,predictPoint,feature,label)
    print(price)

归一化不能使用的时候，
大多数情况最好使用标准化

在这里插入图片描述

以前就说过，避免缺失值和异常值

如果数据是正态分布可以使用归一化
所以使用标准化对所有的数据集进行处理都是可行的。

宫城诗

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
20191010——分类与回归 KNN进行回归问题

classification 和 regression分类和回归的问题回归是求topk和value的平均值分类时topk中出现最多的类别np中有一个广播矩阵，可以进行向量的加减import numpy as npfeature = np.array([ [-121,47], [-121.2,46.5], [-122,46.3], [-120.9,46....
复制链接

扫一扫

专栏目录