一元线性回归(最小二乘法)

最新推荐文章于 2024-06-15 16:21:25 发布

一只小菜皮卡丘

最新推荐文章于 2024-06-15 16:21:25 发布

阅读量2.7k

点赞数

分类专栏：机器学习之路文章标签：机器学习入门基础算法

本文链接：https://blog.csdn.net/weixin_40836993/article/details/91129119

版权

机器学习之路专栏收录该内容

4 篇文章 0 订阅

订阅专栏

刚开始学习机器学习，记录一下学的算法程序，以后好回顾

算法分析

假设y = w*x + b, 欲求w、b，最小二乘法就是试图找到一条直线，使所有样本离直线的欧式距离之和最小,按照下图公式
在这里插入图片描述
也就是要使平方误差最小，通过对w、b的求偏导

令偏导数都为零，解得最小二乘法公式

程序实现

1、导包

import numpy as np
import matplotlib.pyplot as plt

2、读入数据(这里只有几条数据大致表示一下)

# 数据中只有两列，一列表示x，一列表示y
points = np.genfromtxt('data.csv', delimiter=',')
points

# 提取points中的两列数据, 其中points[:, 0] 表示任意行的第0列
x = points[:, 0]
y = points[:, 1]

# 用plt画出散点图
plt.scatter(x, y)
plt.show()

在这里插入图片描述
3、计算损失函数(此处用均方误差来表示)

# 损失函数是系数的函数, 另外还要传入数据
def compute_cost(w, b, points):
    total_cost = 0
    M = len(points)
    
    # 逐点计算平方误差，然后求平均数
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        total_cost += (y - w*x -b) ** 2;
        
    return total_cost / M

4、计算最小二乘法算法(将公式转化成程序)

# 先定义一个求均值的函数
def average(data):
    sum = 0
    num = len(data)
    for i in range(num):
        sum += data[i]
    return sum / num

# 定义核心算法拟合函数
def fit(points):
    M = len(points)
    x_bar = average(points[:, 0])
    
    sum_yx = 0
    sum_x2 = 0
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        
        sum_yx += y * (x - x_bar)
        sum_x2 += x ** 2
    w = sum_yx / (sum_x2 - M * (x_bar**2))
    
    sum_b = 0
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        
        sum_b += y - w * x
    b = sum_b / M
    
    return w, b

到这一步就计算出w,、b了，接下来测试一下拟合结果
5、测试

# 测试
w, b = fit(points)

print("w is ", w)
print("b is ", b)

cost = compute_cost(w, b, points)

print("cost is ", cost)
#%%
# 画出拟合曲线
plt.scatter(x, y)
pred_y = w * x + b
plt.plot(x, pred_y, c='r')
plt.show()

在这里插入图片描述
数据太少了，不过大致可以看出来效果，重在分析

另外sklearn机器学习库中有现成的方法可以实现

直接调库

这里放完整代码

import numpy as np
import matplotlib.pyplot as plt

points = np.genfromtxt('data.csv', delimiter=',')
points

# 提取points中的两列数据，分别作为x, y
x = points[:, 0]
y = points[:, 1]

# 用plt画出散点图
plt.scatter(x, y)
plt.show()

# 损失函数是系数的函数, 另外还要传入数据
def compute_cost(w, b, points):
    total_cost = 0
    M = len(points)
    
    # 逐点计算平方误差，然后求平均数
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        total_cost += (y - w*x -b) ** 2;
        
    return total_cost / M  
   
# 线性回归
from sklearn.linear_model import LinearRegression
lr = LinearRegression()


x_new = x.reshape(-1, 1)
y_new = y.reshape(-1, 1)
lr.fit(x_new, y_new) # 传入x, y拟合得结果
#%%
# 从训练模型中提取系数和截距
w = lr.coef_
b = lr.intercept_

print("w is ", w)
print("b is ", b)

cost = compute_cost(w, b, points)

print("cost is ", cost)

w = lr.coef_[0][0]
b = lr.intercept_[0]

pred_y = w * x + b
plt.scatter(x, y)
plt.plot(x, pred_y, c='r')
plt.show()