推荐系统笔记(三)

这篇博客探讨了监督学习中的回归模型,包括线性回归(一元和多元)及其求解方法——最小二乘法。此外,还介绍了非线性回归和梯度下降法在多元线性回归中的应用。最后,简要提到了分类模型中的k临近(kNN)算法。通过实例展示了如何使用Python实现这些模型并评估其性能。
摘要由CSDN通过智能技术生成

监督学习

回归模型

  • 线性回归模型:f(x) = w1x1+w2x2+…+wdxd+b
    一元线性回归
    多元线性回归
  • 非线性回归模型
  • 最小二乘法:基于均方误差最小化来进行求解的方法。选择未知参数,使得理论值和观测值之差的平方和达到最小。
    一元线性回归最小二乘法
import numpy as np
import matplotlib.pyplot as plt

points = np.genfromtxt('data.csv',delimiter=',')
#提取points中两列数据
x = points[:,0]
y = points[:,1]

#用plt画出散点图
plt.scatter(x,y)
plt.show()

根据公式定义函数

#定义损失函数
def compute_cost(w,b,points):
    total_cost = 0
    M = len(points)
    #逐点计算平方损失误差,再求平均数
    for i in range(M):
        x = points[i,0]
        y = points[i,1]
        total_cost += (y-w*x-b)**2
    return total_cost/M

#先定义一个求均值函数
def average(data):
    sum = 0
    num = len(data)
    for i in range(num):
        sum += data[i]
    return sum/num
#定义核心拟合函数
def fit(points):
    M = len(points)
    x_bar = average(points[:,0])
    sum_yx = 0
    sum_x2 = 0
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        sum_yx += y*(x-x_bar)
        sum_x2 += x**2
    #根据公式计算
    w = sum_yx/(sum_x2-M*(x_bar**2))
    sum_delta = 0
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        sum_delta += (y-w*x)
    b = sum_delta/M
    return w,b

测试

#测试
w,b = fit(points)
print("w is ", w)
print("b is ", b)
cost = compute_cost(w,b,points)
print("cost is ",cost)

plt.scatter(x,y)
pred_y = w*x+b
plt.plot(x,pred_y,c='r')
plt.show()

梯度下降法求多元线性回归

import numpy as np
import matplotlib.pyplot as plt

points = np.genfromtxt('data.csv',delimiter=',')
#提取points中两列数据
x = points[:,0]
y = points[:,1]

#用plt画出散点图
plt.scatter(x,y)
plt.show()
#定义损失函数
def compute_cost(w,b,points):
    total_cost = 0
    M = len(points)
    #逐点计算平方损失误差,再求平均数
    for i in range(M):
        x = points[i,0]
        y = points[i,1]
        total_cost += (y-w*x-b)**2
    return total_cost/M


#定义模型超参数
alpha = 0.0001
initial_w = 0
initial_b = 0
num_iter = 10
#定义核心梯度下降函数
def grad_desc(points, initial_w, initial_b,alpha,num_iter):
    w = initial_w
    b = initial_b
    #定义一个list保存所有损失函数值,来显示下降过程
    cost_list = []
    for i in range(num_iter):
        cost_list.append(compute_cost(w,b,points))
        w,b = step_grad_desc(w,b,alpha,points)
    return [w,b,cost_list]

def step_grad_desc(cur_w,cur_b,alpha,points):
    sum_grad_w = 0
    sum_grad_b = 0
    M = len(points)
    for i in range(M):
        x = points[i,0]
        y = points[i,1]
        sum_grad_w += (cur_w*x+cur_b-y)*x
        sum_grad_b += cur_w*x+cur_b-y

    grad_w = 2/M*sum_grad_w
    grad_b = 2/M*sum_grad_b

    updated_w = cur_w-alpha*grad_w
    updated_b = cur_b-alpha*grad_b

    return updated_w,updated_b

#测试,计算最优的w,b
w,b,cost_list = grad_desc(points, initial_w, initial_b,alpha,num_iter)
print("w is ", w)
print("b is ", b)
plt.plot(cost_list)
plt.show()
cost = compute_cost(w,b,points)
print("cost is ",cost)

#画出拟合曲线
plt.scatter(x,y)
pred_y = w*x+b
plt.plot(x,pred_y,c='r')
plt.show()

sklearn

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

points = np.genfromtxt('data.csv',delimiter=',')
#提取points中两列数据
x = points[:,0]
y = points[:,1]

#用plt画出散点图
plt.scatter(x,y)
plt.show()

#定义损失函数
def compute_cost(w,b,points):
    total_cost = 0
    M = len(points)
    #逐点计算平方损失误差,再求平均数
    for i in range(M):
        x = points[i,0]
        y = points[i,1]
        total_cost += (y-w*x-b)**2
    return total_cost/M

lr = LinearRegression()
x_new = x.reshape(-1,1)
y_new = y.reshape(-1,1)
lr.fit(x_new,y_new)

w = lr.coef_[0][0]
b = lr.intercept_[0]
print("w is ", w)
print("b is ", b)
cost = compute_cost(w,b,points)
print("cost is ",cost)

plt.scatter(x,y)
pred_y = w*x+b
plt.plot(x,pred_y,c='r')
plt.show()

分类模型

  • k临近(kNN)
  • 决策树
  • 逻辑斯蒂回归
    k临近(kNN):如果一个样本在特征空间中k个最相似的样本中的大多数属于一个类别,则该样本也属于这个类别。在KNN算法中,所选的邻居都是已经正确分类的对象。

无监督学习

聚类

  • k均值(k-means)

降维

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值