案例分析
根据面积和价格间的关系求解线性回归
基本流程
1.建立基本模型
2.构造损失函数,求出偏导
3.利用梯度下降更新变量值
4.求出损失函数的值判断是否终止
5.进行迭代运算
代码推导
1.设置的值即梯度下降中的步长,和迭代次数times的值
alpha = 0.1#学习率
times = 10000 # 步长
2.每次利用梯度下降更新 和 的值
for i in range(m): # m为数据总数
theta0 = theta0 - alpha*((theta0+theta1*x[i]-y[i])/m)
theta1 = theta1 - alpha*((theta0+theta1*x[i]-y[i])/m)*x[i]
3.求出损失函数的总值
for i in range(m):
diss = diss + (1/(2*m))*pow((theta0+theta1*x[i]-y[i]),2)
4.判断是否符合退出条件,若不符合则继续迭代运行
if diss <= 100:
break
源代码
import matplotlib.pyplot as plt
import matplotlib
from math import pow
from random import uniform
import random
x0 = [150, 200, 250, 300, 350, 400, 600]
y0 = [6450, 7450, 8450, 9450, 11450, 15450, 18450]
# 为了方便计算,将所有数据缩小 100 倍
x = [1.50, 2.00, 2.50, 3.00, 3.50, 4.00, 6.00]
y = [64.50, 74.50, 84.50, 94.50, 114.50, 154.50, 184.50]
# 线性回归函数为 y=theta0+theta1*x
# 参数定义
theta0 = 0.1 # 对 theata0 赋值
theta1 = 0.1 # 对 theata1 赋值
alpha = 0.1 # 学习率
m = len(x)
count0 = 0
theta0_list = []
theta1_list = []
for num in range(10000):
count0 += 1
diss = 0 # 误差
deriv0 = 0 # 对 theata0 导数
deriv1 = 0 # 对 theata1 导数
# 求导
for i in range(m):
deriv0 += (theta0 + theta1 * x[i] - y[i]) / m
deriv1 += ((theta0 + theta1 * x[i] - y[i]) / m) * x[i]
# 更新 theta0 和 theta1
for i in range(m):
theta0 = theta0 - alpha * ((theta0 + theta1 * x[i] - y[i]) / m)
theta1 = theta1 - alpha * ((theta0 + theta1 * x[i] - y[i]) / m) * x[i]
# 求损失函数 J (θ)
for i in range(m):
diss = diss + (1 / (2 * m)) * pow((theta0 + theta1 * x[i] - y[i]), 2)
theta0_list.append(theta0 * 100)
theta1_list.append(theta1)
# 如果误差已经很小,则退出循环
if diss <= 100:
break
theta0 = theta0 * 100 # 前面所有数据缩小了 100 倍,所以求出的 theta0 需要放大 100 倍,theta1 不用变
print("最终得到theta0={},theta1={}".format(theta0, theta1))
print("得到的回归函数是:y={}+{}*x".format(theta0, theta1))
图 1
图 2