机器学习简介和Python实现线性回归

不摇-碧莲

已于 2022-04-16 23:46:17 修改

阅读量188

点赞数

分类专栏：机器学习文章标签： python 机器学习

于 2020-09-30 18:01:31 首次发布

本文链接：https://blog.csdn.net/yu17671636097/article/details/108888800

版权

机器学习梯度下降线性回归成本函数特征缩放

关键词由CSDN通过智能技术生成

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

机器学习的定义

机器学习是一个非常广泛的领域，很难说有一个非常明确的定义。目前有两个较为流传的定义。

Arthur Samuel：研究让计算机不用明确编程便能拥有学习能力的领域
Tom Mitchell：一个计算机程序可以从经验E中学习有关某类任务T和绩效指标P的信息，如果计算机对T中任务的绩效（由P衡量）随经验E的提高而有所提高。

通常来说，任何的机器学习问题都可以分为两类，有监督学习和无监督学习。

有监督学习：输入和结果之间存在一定的联系，可分为回归和分类两类问题
无监督学习：没有基于预测结果的反馈。

模型和代价函数

梯度下降

梯度下降是用来寻找成本函数最小值的算法，使用的方法是求取成本函数的导数。切线的斜率是该点的导数，它会提供一个方向。使得算法能够沿下降最陡的方向逐步降低成本函数，每个步骤的大小由学习率决定。

梯度下降的算法如下所示，一直重复直到收敛：
$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$
在每一轮的迭代中，参数theta1和theta2应该同步更新，其更新规则如下：
$temp0:=\theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)$

$temp1:=\theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)$

$\theta_0:=temp0$

$\theta_1:=temp1$

线性回归中的梯度下降

当把梯度下降用于线性回归时，梯度下降就有了新的公式。我们需要将真正的成本函数和假设函数代入梯度下降的公式之中，其最终结果如下：
$\begin{aligned} \text{repeat until convergence: } \lbrace & \\ \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \\ \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \\ \rbrace& \end{aligned}$
其中m是训练集的大小，theta0和theta1是同步更新的，x和y的值由训练集给出。

偏微分公式的推导如下所示：
$\begin{aligned} \frac{\partial}{\partial\theta_j}J(\theta)&=\frac{\partial}{\partial\theta_j}\frac{1}{2}(h_\theta(x)-y)^2 \\ &= 2 \cdot\frac{1}{2}(h_\theta(x)-y)\cdot\frac{\partial}{\partial\theta_j}(h_\theta(x)-y)\\ &= (h_\theta(x)-y)\cdot\frac{\partial}{\partial\theta_j}(\sum_{i=0}^n \theta_ix_i-y)\\ &= (h_\theta(x)-y)x_j \end{aligned}$
如果我们从猜测的点开始，不断地使用梯度下降方程进行迭代，那么结果会越来越准确。

Linear Regression With One Variable

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']

path = 'ex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'profit'])
data.head()

	Population	profit
0	6.1101	17.5920
1	5.5277	9.1302
2	8.5186	13.6620
3	7.0032	11.8540
4	5.8598	6.8233

data.describe()

	Population	profit
count	97.000000	97.000000
mean	8.159800	5.839135
std	3.869884	5.510262
min	5.026900	-2.680700
25%	5.707700	1.986900
50%	6.589400	4.562300
75%	8.578100	7.046700
max	22.203000	24.147000

数据可视化，绘制散点图

data.plot(x='Population', y='profit', kind='scatter', figsize=(12, 8))
plt.show()

数据散点图

我们能够使用成本函数来评估假设函数的准确性。对假设函数的结果以及输入 x 和输出y作平均差操作来作为成本函数。
$J(θ_0,θ_1)=\frac{1}{2m}\sum_{i=1}^m(\widehat y_i-y_i)^2=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x_i)-y_i)^2$
这个函数也被称为平方误差函数或均方误差，平均值减半能够方便的计算梯度下降。
其中：\[{{h}{\theta }}\left( x \right)={{\theta }^{T}}X={{\theta }{0}}{{x}{0}}+{{\theta }{1}}{{x}{1}}+{{\theta }{2}}{{x}{2}}+…+{{\theta }{n}}{{x}_{n}}\]

# 损失函数
def computeCost(X:np.ndarray, y:np.ndarray, theta:np.ndarray)->np.ndarray:
    """ 一元线性回归损失函数 """
    J = np.power(((X * theta.T) - y), 2)
    return np.sum(J) / (2 * m)

给数据集添加一列全为 1 的数据，方便后续做矩阵处理

data.insert(0, 'ones', 1)

cols = data.shape[1]
X = data.iloc[:, 0:cols-1]
y = data.iloc[:, cols-1:cols]
m = len(y)

查看处理后的数据是否正确

X.head()

	ones	Population
0	1	6.1101
1	1	5.5277
2	1	8.5186
3	1	7.0032
4	1	5.8598

y.head()

	profit
0	17.5920
1	9.1302
2	13.6620
3	11.8540
4	6.8233

转换X和y为numpy矩阵

X = np.matrix(X)
y = np.matrix(y)
theta = np.matrix(np.array([0, 0]))

theta是一个(1, 2)矩阵

theta

matrix([[0, 0]])

查看数据的维度

X.shape, theta.shape, y.shape

((97, 2), (1, 2), (97, 1))

计算代价函数

computeCost(X, y, theta)

32.072733877455676

$temp1:=\theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)$

$\theta_0:=temp0$

$\theta_1:=temp1$

# 梯度下降函数
def gradientDescent(X, y, theta, alpha:float, num_iters:int)->np.ndarray:
    temp = np.matrix(np.zeros(theta.shape)) # 构建零值矩阵
    parameters = int(theta.ravel().shape[1]) # reval计算需要求解的参数个数，功能将多为数组降至以为
    J_history = np.zeros(num_iters) # 构建iters个0的数组
    
    for iter in range(num_iters):
       
        error = (X * theta.T) - y
        for j in range(parameters):
            term = np.multiply(error, X[:, j]) # 计算两矩阵（hθ(x)-y）x
            temp[0, j] = theta[0, j] - ((alpha / len(X)) * np.sum(term))
        # 保存损失值    
        theta = temp
        J_history[iter] = computeCost(X, y, theta)
        
    return theta, J_history

设置梯度下降的基本参数

# Some gradient descent settings
iterations = 1500
alpha = 0.01

现在让我们运行梯度下降算法来将我们的参数θ适合于训练集。

g, cost = gradientDescent(X, y, theta, alpha, iterations)
g

matrix([[-3.63029144,  1.16636235]])

最后，我们可以使用我们拟合的参数计算训练模型的代价函数（误差）。

computeCost(X, y, g)

4.483388256587726

绘制线性模型和数据，查看拟合效果

x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

数据拟合情况

由于梯度方程式函数也在每个训练迭代中输出一个代价的向量，所以我们也可以绘制。请注意，代价总是降低 - 这是凸优化问题的一个例子。

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iterations), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

在这里插入图片描述

Linear Regression With Multiple Variable

带有多个变量的线性回归也被叫做多元线性回归。接下来，为方程式引入符号，多元线性回归可以拥有任意数量的输入变量。
$\begin{aligned} x^{(i)}_j &=value\ of\ feature\ j\ in\ the\ i^{th}\ training\ example \\ x^{(i)} &= the\ input\ (features)\ of\ the\ i^{th}\ training\ example \\ m &= the\ number\ of\ training\ examples \\ n &= the\ number\ of\ features \end{aligned}$
多元线性回归函数的公式如下所示：
$h_{\theta}(x) = \theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3+...+\theta_nx_n$
为了使函数更加直观，我们可以将 $\theta_0$ 看作房屋的基本价格， $\theta_1$ 看作每平方米的价格, $\theta_2$ 看作每层的价格，其它变量类似。 $x_1$ 作为房屋的平方数， $x_2$ 作为房屋的层数，其他变量类似。

使用矩阵方式表示多元线性回归，其公式如下：
$h_\theta(x)=[\theta_0\ \theta_1\ ...\ \theta_n]\begin{bmatrix}x_0\\x_1\\ \vdots\\x_n\end{bmatrix}=\theta^Tx$
注意:为了方便，我们假设 $x_0^{(i)}=1\ for(i\in1,...,m)$ .这使得我们可以对 $\theta$ 和 $x$ 进行矩阵操作。因而使得 $'\theta'$ 和 $x^{(i)}$ 具有相对应的宽度。

练习1还包括一个房屋价格数据集，其中有2个变量（房子的大小，卧室的数量）和目标（房子的价格）。我们使用我们已经应用的技术来分析数据集。

# 多变量线性回归
path = "ex1data2.txt"
data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Profit'])
data2.head()

	Size	Bedrooms	Profit
0	2104	3	399900
1	1600	3	329900
2	2400	3	369000
3	1416	2	232000
4	3000	4	539900

data2.describe()

	Size	Bedrooms	Profit
count	47.000000	47.000000	47.000000
mean	2000.680851	3.170213	340412.659574
std	794.702354	0.760982	125039.899586
min	852.000000	1.000000	169900.000000
25%	1432.000000	3.000000	249900.000000
50%	1888.000000	3.000000	299900.000000
75%	2269.000000	4.000000	384450.000000
max	4478.000000	5.000000	699900.000000

我们可以通过将每个输入值都设置在相同的范围类来加快梯度下降的速度。因为 $\theta$ 在小范围内会快速下降，但在大范围内又会缓慢下降，所以当变量不均匀时，梯度下降的效率会非常低。

一般使用特征缩放和均值归一化两种方法。

特征缩放：将输入值除以输入变量的范围，来使新范围为1
均值归一化：将输入值减去输入变量的平均值

可以使用以下公式处理输入值：
$x_i:=\frac{x_i-u_i}{s_i}$
其中 $u_i$ 是所有特征的平均值， $s_i$ 是特征的范围或标准偏差

注意：除以范围和除以偏差得到的结果不同

data2 = (data2 - data2.mean()) / data2.std()

## 在数据中添加一列全为1的数据
data2.insert(0, 'Ones', 1,) # 无返回值，直接对数据进行处理

重复单变量线性回归的数据预处理操作

# 将数据切分为变量和预测值
cols = data2.shape[1]
X2 = data2.iloc[:, 0:cols-1]
y2 = data2.iloc[:, cols-1:cols]
X2.head()

	Ones	Size	Bedrooms
0	1	0.130010	-0.223675
1	1	-0.504190	-0.223675
2	1	0.502476	-0.223675
3	1	-0.735723	-1.537767
4	1	1.257476	1.090417

y2.head()

	Profit
0	0.475747
1	-0.084074
2	0.228626
3	-0.867025
4	1.595389

# 将数据转换为矩阵形式
X2 = np.matrix(X2.values)
y2 = np.matrix(y2.values)
theta = np.matrix(np.array([0, 0, 0]))

# 对数据集进行梯度下降处理
g2, cost2 = gradientDescent(X2, y2, theta, alpha, iterations)

# 计算
# get the cost (error) of the model
computeCost(X2, y2, g2)

0.06332242458623788

我们也可以快速查看这一个的训练进程。

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iterations), cost2, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

在这里插入图片描述

不摇-碧莲

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习简介和Python实现线性回归

机器学习是一个非常广泛的领域，很难说有一个非常明确的定义。目前有两个较为流传的定义。+ Arthur Samuel：研究让计算机不用明确编程便能拥有学习能力的领域+ Tom Mitchell：一个计算机程序可以从经验E中学习有关某类任务T和绩效指标P的信息，如果计算机对T中任务的绩效（由P衡量）随经验E的提高而有所提高。通常来说，任何的机器学习问题都可以分为两类，有监督学习和无监督学习。+ 有监督学习：输入和结果之间存在一定的联系，可分为`回归`和`分类`两类问题+ 无监督学习：没有基于预测
复制链接

扫一扫

专栏目录