多元线性回归的代价函数推导
决策函数:
h θ ( x ) = θ 1 x 1 + θ 2 x 2 + . . . + θ n x n = ∑ i = 1 n θ i x i = θ T x h_{\theta}(x)=\theta_1x_1+\theta_2x_2+...+\theta_nx_n=\sum_{i=1}^{n}\theta_ix_i=\theta^Tx hθ(x)=θ1x1+θ2x2+...+θnxn=∑i=1nθixi=θTx
令有m个样本,对于每个样本:
y ( i ) = h θ ( x ( i ) ) + ϵ ( i ) y^{(i)}=h_{\theta}(x^{(i)})+\epsilon^{(i)} y(i)=hθ(x(i))+ϵ(i)…(1)
ϵ ( i ) \epsilon^{(i)} ϵ(i)表示真实值与预测值之间的误差,我们通常认为 ϵ ( i ) \epsilon^{(i)} ϵ(i)是独立并具有相同的分布,并且服从均值为0方差为 θ 2 \theta^2 θ2的高斯分布。
于是:
p ( ϵ ( i ) ) = 1 2 π σ e x p ( − ( ϵ ( i ) ) 2 2 σ 2 ) p(\epsilon^{(i)})=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(\epsilon^{(i)})^2}{2\sigma^2}) p(ϵ(i))=2πσ1exp(−2σ2(ϵ(i))2)…(2)
将(1)代入(2)可得:
p ( y ( i ) ∣ x ( i ) ; θ ) = 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) p(y^{(i)}|x^{(i)};\theta)=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}) p(y(i)∣x(i);θ)=2πσ1exp(−2σ2(y(i)−θTx(i))2)
故似然函数为:
L ( θ ) = ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) L(\theta)=\prod_{i=1}^{m}\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}) L(θ)=∏i=1m2πσ1exp(−2σ2(y(i)−θTx(i))2)
为方便求导变为对数似然:
l n L ( θ ) = l n ∑ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) lnL(\theta)=ln\sum_{i=1}^m\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}) lnL(θ)=ln∑i=1m2πσ1exp(−2σ2(y(i)−θTx(i))2)
= m . l n 1 2 π σ − 1 σ 2 . 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 =m.ln\frac{1}{\sqrt{2\pi}\sigma}-\frac{1}{\sigma^2}.\frac{1}{2}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 =m.ln2πσ1−σ21.21∑i=1m(y(i)−θTx(i))2
此时我们需要求得参数 θ ^ \hat\theta θ^ 使似然函数 L ( θ ) L(\theta) L(θ)最大:
J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 J(\theta)=\frac{1}{2}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 J(θ)=21∑i=1m(y(i)−θTx(i))2(最小二乘法)
由化简后的式子可知,上式越小,似然函数越大
在吴恩达的课程中将代价函数写为:
J ( θ ) = 1 2 m ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 J(\theta)=\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 J(θ)=2m1∑i=1m(y(i)−θTx(i))2
方便计算