目标:线性回归问题,找到最佳参数使得损失函数最小
一、损失函数定义
- 线性方程: y = a x + b y=ax+b y=ax+b
- 对于每个样本点 x ( i ) x^{(i)} x(i) ,其预测值为 y ^ ( i ) = a x ( i ) + b \hat y^{(i)}=ax^{(i)}+b y^(i)=ax(i)+b
- 对于每个样本点 x ( i ) x^{(i)} x(i) ,其真实值为 y ( i ) y^{(i)} y(i)
- 那么损失函数 l o s s = ( y ( i ) − y ^ ( i ) ) 2 loss=\left(y^{(i)}-\hat y^{(i)}\right)^2 loss=(y(i)−y^(i))2(使用平方差的形式是为使loss函数连续,方便求导)
- 那么涵盖所有样本点的损失函数即为:
l o s s = ∑ i = 1 n ( y ( i ) − y ^ ( i ) ) 2 \color{red}{loss=\sum_{i=1}^n \left(y^{(i)}-\hat y^{(i)}\right)^2} loss=i=1∑n(y(i)−y^(i))2 - 将 y ^ ( i ) = a x ( i ) + b \hat y^{(i)}=ax^{(i)}+b y^(i)=ax(i)+b带入上面的公式中,整理得: l o s s = ∑ i = 1 n ( y ( i ) − a x ( i ) − b ) 2 \color{green}{loss=\sum_{i=1}^n \left(y^{(i)}-ax^{(i)}-b\right)^2} loss=i=1∑n(y(i)−ax(i)−b)2
- 因为
x
(
i
)
x^{(i)}
x(i)和
y
(
i
)
y^{(i)}
y(i)均为已知量,则上面的公式即为随未知量
a
\color{red}{a}
a与
b
\color{red}{b}
b的变化公式:
J ( a , b ) = ∑ i = 1 n ( y ( i ) − a x ( i ) − b ) 2 \color{black}{J\left(a,b\right)=\sum_{i=1}^n \left(y^{(i)}-ax^{(i)}-b\right)^2} J(a,b)=i=1∑n(y(i)−ax(i)−b)2
二、最小二乘法(方程形式)
- 找到合适的a和b,使得 J ( a , b ) J\left(a,b\right) J(a,b)尽可能小,使用偏微分方程求极值的方法求解a和b。 ∂ J ( a , b ) ∂ a = 0 ∂ J ( a , b ) ∂ b = 0 \begin{aligned} \frac{\partial J\left(a,b\right)}{\partial a}=0 &&\text{ } \frac{\partial J\left(a,b\right)}{\partial b}=0 \end{aligned} ∂a∂J(a,b)=0 ∂b∂J(a,b)=0
- 对b求偏导得:
b
=
y
ˉ
−
a
x
ˉ
b=\bar y-a\bar x
b=yˉ−axˉ
∂ J ( a , b ) ∂ b = ∑ i = 1 n 2 ( y ( i ) − a x ( i ) − b ) ( − 1 ) = ∑ i = 1 n ( y ( i ) − a x ( i ) − b ) = ∑ i = 1 n y ( i ) − a ∑ i = 1 n x ( i ) − ∑ i = 1 n b = ∑ i = 1 n y ( i ) − a ∑ i = 1 n x ( i ) − n b = 0 b = ∑ i = 1 n y ( i ) + a ∑ i = 1 n x ( i ) n 根据式2.4求得b = y ˉ − a x ˉ 最终得到b \begin{aligned} \frac{\partial J\left(a,b\right)}{\partial b} &=\sum_{i=1}^n\bcancel{2}\left(y^{(i)}-ax^{(i)}-b\right)\bcancel{\left(-1\right)}\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-b\right)\\ &=\sum_{i=1}^ny^{(i)}-a\sum_{i=1}^nx^{(i)}-\color{red}{\sum_{i=1}^nb}\\ &=\sum_{i=1}^ny^{(i)}-a\sum_{i=1}^nx^{(i)}-\color{red}{nb}=0\\ \\ b&=\frac{\sum_{i=1}^ny^{(i)}+a\sum_{i=1}^nx^{(i)}}{n}&&\text{根据式2.4求得b}\\ &=\color{green}{\bar y-a\bar x}&&\text{最终得到b} \end{aligned} ∂b∂J(a,b)b=i=1∑n2 (y(i)−ax(i)−b)(−1) =i=1∑n(y(i)−ax(i)−b)=i=1∑ny(i)−ai=1∑nx(i)−i=1∑nb=i=1∑ny(i)−ai=1∑nx(i)−nb=0=n∑i=1ny(i)+a∑i=1nx(i)=yˉ−axˉ根据式2.4求得b最终得到b - 对a求偏导得:
a
=
∑
i
=
1
n
(
x
(
i
)
−
x
ˉ
)
(
y
(
i
)
−
y
ˉ
)
∑
i
=
1
n
(
x
(
i
)
−
x
ˉ
)
2
a=\frac{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y\right)}{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)^2}
a=∑i=1n(x(i)−xˉ)2∑i=1n(x(i)−xˉ)(y(i)−yˉ)
(3.5) ∂ J ( a , b ) ∂ a = ∑ i = 1 n 2 ( y ( i ) − a x ( i ) − b ) ( − x ( i ) ) = ∑ i = 1 n ( y ( i ) − a x ( i ) − b ) x ( i ) = ∑ i = 1 n ( y ( i ) − a x ( i ) − y ˉ + a x ˉ ) x ( i ) 将公式2.6带入 = ∑ i = 1 n ( x ( i ) y ( i ) − a ( x ( i ) ) 2 − x ( i ) y ˉ + a x ˉ x ( i ) ) 展开公式 = ∑ i = 1 n ( x ( i ) y ( i ) − x ( i ) y ˉ ) − a ∑ i = 1 n ( ( x ( i ) ) 2 − x ˉ x ( i ) ) = 0 a = ∑ i = 1 n ( x ( i ) y ( i ) − x ( i ) y ˉ ) ∑ i = 1 n ( ( x ( i ) ) 2 − x ˉ x ( i ) ) 根据式3.5求得a = ∑ i = 1 n ( x ( i ) y ( i ) − x ( i ) y ˉ − x ˉ y ( i ) + x ˉ ⋅ y ˉ ) ∑ i = 1 n ( ( x ( i ) ) 2 − x ˉ x ( i ) − x ˉ x ( i ) + x ˉ 2 ) 根据式3.9变换 = ∑ i = 1 n ( x ( i ) − x ˉ ) ( y ( i ) − y ˉ ) ∑ i = 1 n ( x ( i ) − x ˉ ) 2 最终得到a \begin{aligned} \frac{\partial J\left(a,b\right)}{\partial a} &=\sum_{i=1}^n\bcancel{2}\left(y^{(i)}-ax^{(i)}-b\right)\left(\bcancel{-}x^{(i)}\right)\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-\color{red}{b}\right)x^{(i)}\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-\color{red}{\bar y+a\bar x}\right)x^{(i)}&&\text{将公式2.6带入}\\ &=\sum_{i=1}^n\left(\color{blue}{x^{(i)}y^{(i)}}\color{green}{-a\left(x^{(i)}\right)^2}\color{blue}{-x^{(i)}\bar y}\color{green}{+a\bar xx^{(i)}}\right)&&\text{展开公式}\\ &=\sum_{i=1}^n\left(\color{blue}{x^{(i)}y^{(i)}-x^{(i)}\bar y}\right)-a\sum_{i=1}^n\left(\color{green}{\left(x^{(i)}\right)^2-\bar xx^{(i)}}\right)=0\tag{3.5}\\ \\ a&=\frac{\sum_{i=1}^n\left(x^{(i)}y^{(i)}-x^{(i)}\bar y\right)}{\sum_{i=1}^n\left(\left(x^{(i)}\right)^2-\bar xx^{(i)}\right)}&&\text{根据式3.5求得a}\\ &=\frac{\sum_{i=1}^n\left(x^{(i)}y^{(i)}-x^{(i)}\bar y-\color{#A00}{\bar xy^{(i)}}+\color{#A00}{\bar x\cdot\bar y}\right)}{\sum_{i=1}^n\left(\left(x^{(i)}\right)^2-\bar xx^{(i)}-\color{#A0A}{\bar xx^{(i)}}+\color{#A0A}{\bar x^2}\right)}&&\text{根据式3.9变换}\\ &=\color{green}{\frac{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y\right)}{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)^2}}&&\text{最终得到a}\\ \\ \end{aligned} ∂a∂J(a,b)a=i=1∑n2 (y(i)−ax(i)−b)(− x(i))=i=1∑n(y(i)−ax(i)−b)x(i)=i=1∑n(y(i)−ax(i)−yˉ+axˉ)x(i)=i=1∑n(x(i)y(i)−a(x(i))2−x(i)yˉ+axˉx(i))=i=1∑n(x(i)y(i)−x(i)yˉ)−ai=1∑n((x(i))2−xˉx(i))=0=∑i=1n((x(i))2−xˉx(i))∑i=1n(x(i)y(i)−x(i)yˉ)=∑i=1n((x(i))2−xˉx(i)−xˉx(i)+xˉ2)∑i=1n(x(i)y(i)−x(i)yˉ−xˉy(i)+xˉ⋅yˉ)=∑i=1n(x(i)−xˉ)2∑i=1n(x(i)−xˉ)(y(i)−yˉ)将公式2.6带入展开公式根据式3.5求得a根据式3.9变换最终得到a(3.5)
y ˉ ∑ i = 1 n x ( i ) ⇋ ∑ x ( i ) ⇔ n x ˉ n y ˉ ⋅ x ˉ ⇋ x ˉ ∑ i = 1 n y ( i ) ⇋ ∑ i = 1 n x ( i ) y ˉ ⇋ ∑ i = 1 n x ˉ ⋅ y ˉ ⇋ ∑ i = 1 n y ( i ) x ˉ \bar y \sum_{i=1}^nx^{(i)}\xleftrightharpoons{\sum x^{(i)}\Leftrightarrow n\bar x} n\bar y\cdot\bar x \xleftrightharpoons{}\bar x \sum_{i=1}^ny^{(i)} \xleftrightharpoons{} \sum_{i=1}^nx^{(i)}\bar y\xleftrightharpoons{}\sum_{i=1}^n\bar x\cdot\bar y\xleftrightharpoons{}\sum_{i=1}^ny^{(i)}\bar x yˉi=1∑nx(i)∑x(i)⇔nxˉ nyˉ⋅xˉ xˉi=1∑ny(i) i=1∑nx(i)yˉ i=1∑nxˉ⋅yˉ i=1∑ny(i)xˉ
三、最小二乘法(矩阵形式)
######待续