【深度之眼吴恩达机器学习第四期】笔记（三）

最新推荐文章于 2022-07-20 16:00:12 发布

树天先森

最新推荐文章于 2022-07-20 16:00:12 发布

阅读量301

点赞数

分类专栏：吴恩达机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/qq_40923177/article/details/104083735

版权

作业一（线性回归）和作业二（逻辑回归）ps:吐槽自己一句，其实原来的代码注释已经很详细了，真不知道自己写来干嘛。。。

摘要由CSDN通过智能技术生成

线性回归

单变量线性回归

准备工作

# 导入需要使用的包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 导入数据集
path =  'ex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])
data.head()  # 回数据的前几行，默认五行
data.describe()	# 生成描述性统计数据

数据展示：

	Population	Profit
0	6.1101	17.5920
1	5.5277	9.1302
2	8.5186	13.6620
3	7.0032	11.8540
4	5.8598	6.8233

数据统计：

	Population	Profit
count	97.000000	97.000000
mean	8.159800	5.839135
std	3.869884	5.510262
min	5.026900	-2.680700
25%	5.707700	1.986900
50%	6.589400	4.562300
75%	8.578100	7.046700
max	22.203000	24.147000

数据可视化，绘制散点图：

# 第一个参数代表散点图，第二第三个参数是x轴y轴的数据，最后是图的大小
data.plot(kind='scatter', x='Population', y='Profit', figsize=(12,8))
plt.show()

在这里插入图片描述

# 在第0列插入一列(Ones)1，即x0恒为1
data.insert(0, 'Ones', 1)

# 分离数据集为X和y
cols = data.shape[1]
X = data.iloc[:,0:cols-1]#X是data去掉最后一列
y = data.iloc[:,cols-1:cols]#X是data的最后一列
# 观察下 X (训练集) and y (目标变量)是否正确
X.head()
y.head()

# 转化为numpy矩阵
X = np.matrix(X.values)
y = np.matrix(y.values)
# 初始化theta为0向量
theta = np.zeros(shape=(1,X.shape[1]))
# 输出theta（array([[0., 0.]])）
theta
# 查看维度((97, 2), (1, 2), (97, 1))
X.shape, theta.shape, y.shape

# 计算代价函数（32.07273387745567）
computeCost(X, y, theta)

其中代价函数为：
在这里插入图片描述

def computeCost(X, y, theta):
    dif = np.dot(X,theta.T)-y
    cost = np.dot(dif.T,dif)[0,0]/(2*len(X))
    return cost

梯度下降

梯度下降函数：
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200125181207611.png#

# iters是迭代次数
def gradientDescent(X, y, theta, alpha, iters):
    # 保存更新的theta
    temp = np.matrix(np.zeros(theta.shape))
    # theta中元素的个数
    parameters = int(theta.shape[1])
    # 保存迭代后的损失函数，用来画图
    cost = np.zeros(iters)
    
    # 迭代循环
    for i in range(iters):
        # 预测值和真实值的误差
        error = np.dot(X,theta.T)-y
        # 循环更新theta中元素
        for j in range(parameters):           
            round_theta = np.dot(error.T,X[:,j])[0,0]/len(X)
            temp[0,j]=theta[0,j]-alpha*round_theta      
        theta = temp
        cost[i]=computeCost(X,y,theta)
        
    return theta, cost

# 初始化参数
alpha = 0.01
iters = 1000

# 用梯度下降寻找theta
g, cost = gradientDescent(X, y, theta, alpha, iters)
# 得到的theta为matrix([[-3.24140214,  1.1272942 ]])
g

# 计算该theta的损失（4.515955503078913）
computeCost(X, y, g)

# 绘制线性模型以及数据，查看拟合情况
x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))
# 用红色画回归直线
ax.plot(x, f, 'r', label='Prediction')
# 画原始数据
ax.scatter(data.Population, data.Profit