吴恩达机器学习作业笔记（一）线性回归

最新推荐文章于 2024-07-11 13:14:28 发布

weixin_44365171

最新推荐文章于 2024-07-11 13:14:28 发布

阅读量102

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/weixin_44365171/article/details/109675348

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

线性回归笔记

调用库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

导入数据

path='ex1data1.txt'
data=pd.read_csv(path,header=None,names=['Population','Profit'])

pd.read_csv参数

data.head()

返回数据的前几行，默认前5

data.describe()

生成统计数据表
统计表格

data.plot(kind='scatter',x='Population',y='Profit',figsize=(12,8))
plt.show()

绘制数据散点图

data.insert(0, 'Ones', 1)

在data中最左边插入一全为1的列，便于向量运算theta*x0(x0=1),Ones为插入列的标题
data.insert参数描述

cols=data.shape[1]
X=data.iloc[:,0:cols-1]
Y=data.iloc[:,cols-1:cols]

将数据分为X和Y分别为输入和输出
iloc函数与loc函数

X=np.matrix(X.values)
Y=np.matrix(Y.values)

将X、Y转换为numpy的矩阵matrix
matrix操作

theta=np.zeros(shape=(1,X.shape[1]))

初始化theta为0向量，而后不断更新

def computerCost(X,Y,theta):
    dif=np.dot(X,theta.T)-Y
    cost=np.dot(dif.T,dif)[0,0]/(2*len(X))
    return cost
computerCost(X,Y,theta)

定义损失函数，计算初始值

#alpha为学习率；iters为迭代次数
def gradientDescent(X,Y,theta,alpha,iters):
    #构建0矩阵，不断更新保存theta
    temp=np.matrix(np.zeros(theta.shape))
    #保存theta参数的个数，用于循环次数；int转化为整数
    parameters=int(theta.shape[1])
    #构建inters个0的数组，不断更新保存每一次迭代的cost
    cost=np.zeros(iters)
    
    for i in range(iters):
        #预测值与真实值之间的误差
        error=np.dot(X,theta.T)-Y
        #计算theta，并更新
        for j in range(parameters):
            term=np.multiply(error,X[:,j])
            temp[0,j]=theta[0,j]-((alpha/len(X)*np.sum(term)))
             theta=temp
            cost[i]=computerCost(X,Y,theta)
    return theta,cost

批量梯度下降
矩阵乘法

alpha=0.01
iters=1000

初始化参数

theta,cost=gradientDescent(X,Y,theta,alpha,iters)
computeCost(X,Y,theta)

计算

x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = theta[0, 0] + (theta[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

绘制回归直线
fig,ax含义

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

绘制误差与迭代次数关系图`

# 一个房屋价格数据集，其中有2个变量（房子的大小，卧室的数量）和目标（房子的价格）
path =  'ex1data2.txt'
data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])
data2.head()
# 归一化
data2 = (data2 - data2.mean()) / data2.std()
data2.head()
data2.insert(0, 'Ones', 1)
cols = data2.shape[1]
X2 = data2.iloc[:,0:cols-1]
y2 = data2.iloc[:,cols-1:cols]
X2 = np.matrix(X2.values)
y2 = np.matrix(y2.values)
theta2 = np.matrix(np.array([0,0,0]))
theta2, cost2 = gradientDescent(X2, y2, theta2, alpha, iters)
computeCost(X2, y2, theta2)
#绘图
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost2, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

多变量回归
正规方程原理

正规方程即为直接另待求方程导数为0，解出θ

def normalEqn(X, Y):
   	inv = np.dot(X.T,X).I
    theta = inv@X.T@Y
    return theta

final_theta2=normalEqn(X, Y)
final_theta2.T

matrix([[-3.89578088, 1.19303364]])

weixin_44365171

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达机器学习作业笔记（一）线性回归

线性回归笔记调用库import numpy as npimport pandas as pdimport matplotlib.pyplot as plt导入数据path='ex1data1.txt'data=pd.read_csv(path,header=None,names=['Population','Profit'])pd.read_csv参数data.head()返回数据的前几行，默认前5data.describe()生成统计数据表data.plot(kind=
复制链接

扫一扫