机器学习作业1编程作业(python): Linear Regression

clnnnnn

于 2020-10-30 17:39:35 发布

阅读量554

点赞数 2

分类专栏：吴恩达机器学习课作业

本文链接：https://blog.csdn.net/weixin_45033788/article/details/109363978

版权

这篇博客介绍了机器学习中的线性回归，包括单变量和多变量线性回归。通过实例展示了如何使用Python处理数据，计算代价函数并进行梯度下降更新。还探讨了正规方程在特征数量小于10000时的优势，并比较了它与梯度下降法的效果。

摘要由CSDN通过智能技术生成

英文文档图片均来自原档作业pdf截图网址https://www.coursera.org/learn/machine-learning/programming/8f3qT/linear-regression

LInear regression with one variable

在这里插入图片描述

单变量线性回归，使用数据ex1data1

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

path = r'D:\Ninachen\wg_machinelearning\machine-learning-ex1\ex1\ex1data1.txt'
data = pd.read_csv(path, header=None, names=['population', 'profit'])
data.plot.scatter(x = 'population', y = 'profit', marker = '+')

初步观察原始数据，选择一次函数：
*在这里插入图片描述*

X矩阵：每行是一组数据+一个常量–>n组数据k个特征+1个常数列（即x0）->n行(k+1)列矩阵

θ向量：一些系数（个数取决于假设函数的形式）–> 列向量

在这里插入图片描述
代价函数（矩阵向量形式）

由输入给定迭代次数、学习率，并新建二维数组存储每次的代价函数值：


# cost_table = np.empty([iterations], dtype = float)
# cost_table[0] = cost_function(X, y, theta)

写在一开始：

datamat = data.values #转为矩阵
X = np.column_stack((datamat[:,0], np.ones((len(datamat[:,0]),1)))) #加上一列常数
y = datamat[:,-1] #最后一列

theta = np.zeros((2,1)) #2个系数组成的列向量
iterations = int(input('please input the iterations:')) #给定迭代次数
alpha = float(input('please input the  learning rate alpha:')) #学习率

cost_table = np.empty([iterations]) #后续存放每一步的代价函数值

m = X.shape[0]  #m为数据组数
def cost_function(X, y, theta): #即代价函数J(θ)
    Y = y.reshape((m, 1))
    J_theta = np.sum(np.power((X.dot(theta) - Y), 2)) / (2 * m)
    return J_theta  # J_theta为一个数值

cost_table[0] = cost_function(X, y, theta) #代价函数初值

*注意要用reshape将向量y化为矩阵Y，否则维数不对，矩阵乘法会出错

Y = y.reshape((m, 1))

Y.shape
(97, 1) #是二维数组

y.shape
(97,) #是一维向量

由初值θ计算的代价函数：

cost_function(X, y, theta)
32.072733877455676

再计算θ的更新公式：
在这里插入图片描述
仍然用矩阵计算

def delta(X, y, theta, alpha):

    Y = y.reshape((m, 1))
    s1 = (X @ theta - Y).T @ X
    return alpha / m * s1.T

迭代更新θ：

for i in range(1, iterations):
    theta = theta - delta(X, y, theta, alpha)
    cost_table[i] = cost_function(X, y, theta)
# 运行结束后，theta即为最后优化后的θ

*这里一开始报错

<input>:4: RuntimeWarning: overflow encountered in power
D:\Program Files\Python39\lib\site-packages\numpy\core\fromnumeric.py:87: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

修改sigmoid()后正常：（添加此段代码）