Linear Regression学习笔记

最新推荐文章于 2024-05-18 10:19:38 发布

npupengsir

最新推荐文章于 2024-05-18 10:19:38 发布

阅读量1.2k

点赞数

分类专栏：算法入门文章标签：线性回归

本文链接：https://blog.csdn.net/u012897374/article/details/75137358

版权

算法入门专栏收录该内容

20 篇文章 0 订阅

订阅专栏

回归主要分为线性回归和逻辑回归。线性回归主要解决连续值预测问题，逻辑回归主要解决分类问题。但逻辑回归输出的是属于某一类的概率，因此常被用来进行排序。

1. 线性回归的原理

假定输入 $\chi$ 和输出 $y$ 之间有线性相关关系，线性回归就是学习一个映射

f : χ \to y

$f: \chi \to y$
然后对于给定的样本

x $x$ ，预测其输出：

y^= f (x)

$\hat y=f(x)$

现假定 $x=(x_0,x_1\dots x_n)$ ，则预测值为：

h θ (x) = \sum i = 0 n θ i x i = θ T x

$h_\theta(x)=\sum_{i=0}^n\theta_ix_i=\theta^Tx$
在特征

x $x$ 中加上一维

x0=1 $x_0=1$ 表示截距，即：

f (x) = θ 0 + θ 1 x 1 + θ 2 x 2 + \dots + θ n x n

$f(x)=\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n$

2. 损失函数

为了找到最好的权重参数 $\theta$ ，令 $X$ 到 $y$ 的映射函数记为

f (x) = h θ (x)

$f(x)=h_\theta(x)$
其中

θ = (θ 0, θ 1 \dots θ n)

$\theta=(\theta_0, \theta_1\dots\theta_n)$
为了评价模型拟合的效果，对于一个特定的数据集

(X,y) $(X,y)$ 定义一个损失函数来计算预测值与真实值之间的误差：

J θ (X) = J (θ 0, θ 1 \dots θ n) (X) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J_\theta(X)=J_{(\theta_0, \theta_1\dots\theta_n)}(X)=\frac 1{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
即总体误差是所有样本点误差平方和的均值，其中

(x(i),y(i)) $(x^{(i)},y^{(i)})$ 表示的是第

i $i$ 个样本点。现在给定数据集

(X,y) $(X, y)$ ，要求解的目标为使得

Jθ(X) $J_\theta(X)$ 最小的

θ $\theta$ ，即：

θ = arg min θ {1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2}

$\theta = \arg\min_\theta \{\frac 1{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2\}$

3. 梯度下降

假设有一堆样本点 $(x_1, y_1)(x_2,y_2)\dots (x_n,y_n)$ ，定义函数 $h_\theta(x)$ 来模拟 $y$ 。假设最后的拟合函数为 $f(X)=h_\theta(X)$ 。则损失函数为：

J (θ) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J(\theta)=\frac 1{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

首先随机初始化 $\theta$ ，例如令 $\vec\theta=\vec 0$ 。
不断变化 $\vec \theta$ 的值来改变 $J(\theta)$ 的值，使其越来越小。改变的规则为：

$θ i : = θ i - α \partial J ( θ ) \partial θ i$ $\theta_i:=\theta_i-\alpha\frac {\partial J(\theta)}{\partial \theta_i}$
$\partial J ( θ ) \partial θ i = \sum j = 1 m (h θ (x (j)) - y (j)) x (j) i$ $\frac {\partial J(\theta)}{\partial \theta_i}=\sum_{j=1}^m(h_\theta(x^{(j)})-y^{(j)})x_i^{(j)}$
因此对于所有的 $m$ 个样本点求和，有：
$θ i : = θ i - α \sum j = 1 m [(h θ (x (j)) - y (j)) \cdot x (j) i]$ $\theta_i:=\theta_i-\alpha\sum_{j=1}^m[(h_\theta(x^{(j)})-y^{(j)})\cdot x_i^{(j)}]$
其中 $x^{(j)}，y^{(j)}$ 表示第 $j$ 个样本点， $x^{(j)}$ 是一个向量， $x_i^{(j)}$ 表示第 $j$ 个样本点 $x^{(j)}$ 的第 $i$ 个分量，是一个标量。
不断重复上述过程，直到最后收敛(例如最后发现损失函数 $J_\theta(X)$ 基本不再变化)。

整个过程当中， $\theta, h_\theta(x), J_\theta(X)$ 都会不断变化，但是 $h_\theta(x)$ 会越来越接近 $y$ ，因此 $J_\theta(x)$ 会变得越来越小，最后接近0。

4. 利用最小二乘拟合的方法来计算 $\theta$

X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ (x (1)) T (x (2)) T ⋮ (x (n)) T ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$X= \begin{bmatrix} (x^{(1)})^T\\ (x^{(2)})^T\\ \vdots \\ (x^{(n)})^T\\ \end{bmatrix}$

X \cdot θ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ (x (1)) T θ (x (2)) T θ ⋮ (x (n)) T θ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ h θ (x (1)) T h θ (x (2)) T ⋮ h θ (x (n)) T ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$X\cdot \theta= \begin{bmatrix} (x^{(1)})^T \theta\\ (x^{(2)})^T \theta\\ \vdots \\ (x^{(n)})^T\theta\\ \end{bmatrix}= \begin{bmatrix} h_\theta(x^{(1)})^T\\ h_\theta(x^{(2)})^T\\ \vdots \\ h_\theta(x^{(n)})^T\\ \end{bmatrix}$

y = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ y (1) y (2) ⋮ y (n) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$y= \begin{bmatrix} y^{(1)}\\ y^{(2)}\\ \vdots \\ y^{(n)}\\ \end{bmatrix}$

X \cdot θ - y = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ h θ (x (1)) T - y (1) h θ (x (2)) T - y (2) ⋮ h θ (x (n)) T - y (n) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$X\cdot \theta-y= \begin{bmatrix} h_\theta(x^{(1)})^T-y^{(1)}\\ h_\theta(x^{(2)})^T-y^{(2)}\\ \vdots \\ h_\theta(x^{(n)})^T-y^{(n)}\\ \end{bmatrix}$
为了计算函数

Jθ(x) $J_\theta(x)$ 在指定的计算步骤内达到的最小值，每次我们都沿当前点下降最快的方向移动。最快的方向即梯度方向：

(\partial J θ ( x ( i ) ) \partial θ 0, \partial J θ ( x ( i ) ) \partial θ 1 \dots \partial J θ ( x ( i ) ) \partial θ n)

$(\frac {\partial J_\theta(x^{(i)})}{\partial \theta_0}, \frac {\partial J_\theta(x^{(i)})}{\partial \theta_1}\dots \frac {\partial J_\theta(x^{(i)})}{\partial \theta_n})$

假设 $z$ 是一个向量， $z= \begin{pmatrix} z_1\\ z_2\\ \vdots \\ z_n\\ \end{pmatrix}$ 。则: $z^Tz=\sum_{i=0}^nz_i^2$ 。

故

(X θ - y) T (X θ - y) = 1 2 \sum i = 1 m (h θ (x (i)) - y (i)) 2

$(X\theta -y)^T(X\theta -y)=\frac 12\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
则

J (θ) = 1 2 (X θ - y) T (X θ - y)

$J(\theta)=\frac 12(X\theta -y)^T(X\theta -y)$
要求梯度，令

\nabla θ J (θ) = 0 ⃗

$\nabla_\theta J(\theta)= \vec 0$

\nabla θ J (θ) = \nabla θ 1 2 (x θ - y) T (x θ - y) = x T x θ - x T y = 0 ⃗

$\nabla_\theta J(\theta)= \nabla_\theta \frac 12(x\theta-y)^T(x\theta-y)=x^Tx\theta-x^Ty=\vec 0$
求得

θ ⃗ = (x T x) - 1 x T y

$\vec \theta=(x^Tx)^{-1}x^Ty$
最终

θ⃗ $\vec \theta$ 是一个

m×1 $m\times 1$ 的向量。这样对于简单的线性回归问题，就不需要用前面的迭代方法啦。

如果 $x^Tx$ 是不可逆的，说明 x <script type="math/tex" id="MathJax-Element-67">x</script>当中有特征冗余，需要去掉某些特征。

npupengsir

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Linear Regression学习笔记

回归主要分为线性回归和逻辑回归。线性回归主要解决连续值预测问题，逻辑回归主要解决分类问题。但逻辑回归输出的是属于某一类的概率，因此常被用来进行排序。1. 线性回归的原理假定输入χ\chi和输出yy之间有线性相关关系，线性回归就是学习一个映射 f:χ→yf: \chi \to y 然后对于给你的样本xx，预测其输出： y^=f(x)\hat y=f(x)现假定x=(x0,x1…xn)x=(x
复制链接

扫一扫