我没能在网上找到关于多元线性回归代价函数为凸函数的证明,就打算自己写一个,如果有错误之处,希望发现的读者能够在评论中指正,感谢。
首先来一条引理:二阶可微的函数为严格凸函数的充分必要条件为该函数的海塞矩阵为正定矩阵。
写出代价函数
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
上式中的
θ
\theta
θ、
x
(
i
)
x^{(i)}
x(i)、
y
(
i
)
y^{(i)}
y(i)均为向量
为方便,我们研究
P
(
θ
)
=
2
m
J
(
θ
)
=
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
P(\theta)=2mJ(\theta)=\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
P(θ)=2mJ(θ)=i=1∑m(hθ(x(i))−y(i))2
我们再将平方项展开
P
(
θ
)
=
∑
i
=
1
m
∑
j
=
0
n
(
x
j
(
i
)
θ
j
)
2
+
∑
i
=
1
m
∑
j
<
k
2
x
j
(
i
)
x
k
(
i
)
θ
j
θ
k
−
∑
i
=
1
m
∑
j
=
0
n
2
y
(
i
)
x
j
(
i
)
θ
j
+
∑
i
=
1
m
(
y
(
i
)
)
2
P(\theta)=\sum_{i=1}^{m}\sum_{j=0}^{n}(x_j^{(i)}\theta_j)^2+\sum_{i=1}^{m}\sum_{j<k}2x_j^{(i)}x_k^{(i)}\theta_j\theta_k-\sum_{i=1}^{m}\sum_{j=0}^{n}2y^{(i)}x_j^{(i)}\theta_j+\sum_{i=1}^m(y^{(i)})^2
P(θ)=i=1∑mj=0∑n(xj(i)θj)2+i=1∑mj<k∑2xj(i)xk(i)θjθk−i=1∑mj=0∑n2y(i)xj(i)θj+i=1∑m(y(i))2
上式中m为训练集样本个数,n表示训练样本输入变量有n个分量
简记一下
P
(
θ
)
P(\theta)
P(θ)
P
(
θ
)
=
∑
i
=
0
n
1
2
a
i
i
θ
i
2
+
∑
i
<
j
a
i
j
θ
i
θ
j
−
∑
i
=
1
m
∑
j
=
0
n
2
y
(
i
)
x
j
(
i
)
θ
j
+
∑
i
=
1
m
(
y
(
i
)
)
2
P(\theta)=\sum_{i=0}^{n}\frac{1}{2}a_{ii}\theta_i^2+\sum_{i<j}a_{ij}\theta_i\theta_j-\sum_{i=1}^{m}\sum_{j=0}^{n}2y^{(i)}x_j^{(i)}\theta_j+\sum_{i=1}^m(y^{(i)})^2
P(θ)=i=0∑n21aiiθi2+i<j∑aijθiθj−i=1∑mj=0∑n2y(i)xj(i)θj+i=1∑m(y(i))2
其中
a
i
j
=
∑
k
=
1
m
2
x
i
(
k
)
x
j
(
k
)
a_{ij}=\sum_{k=1}^m2x_i^{(k)}x_j^{(k)}
aij=k=1∑m2xi(k)xj(k)
写出
P
(
θ
)
P(\theta)
P(θ)的海塞矩阵
H
H
H
(
a
00
a
01
a
02
⋯
a
0
n
a
10
a
11
a
12
⋯
a
1
n
⋮
⋮
⋮
⋱
⋮
a
n
0
a
n
1
a
n
2
⋯
a
n
n
)
\begin{pmatrix} a_{00} & a_{01} & a_{02} & \cdots & a_{0n} \\ a_{10} & a_{11} & a_{12} & \cdots & a_{1n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n0} & a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{pmatrix}
⎝⎜⎜⎜⎛a00a10⋮an0a01a11⋮an1a02a12⋮an2⋯⋯⋱⋯a0na1n⋮ann⎠⎟⎟⎟⎞
令
θ
T
H
θ
=
∑
i
=
0
n
a
i
i
θ
i
2
+
∑
i
≠
j
a
i
j
θ
i
θ
j
=
θ
T
(
∑
i
=
1
m
B
i
)
θ
\theta^TH\theta=\sum_{i=0}^{n}a_{ii}\theta_i^2+\sum_{i\neq j}a_{ij}\theta_i\theta_j=\theta^T(\sum_{i=1}^{m}B_{i})\theta
θTHθ=i=0∑naiiθi2+i=j∑aijθiθj=θT(i=1∑mBi)θ
上式中
B
i
B_i
Bi是
(
2
x
0
(
i
)
x
0
(
i
)
2
x
0
(
i
)
x
1
(
i
)
2
x
0
(
i
)
x
2
(
i
)
⋯
2
x
0
(
i
)
x
n
(
i
)
2
x
1
(
i
)
x
0
(
i
)
2
x
1
(
i
)
x
1
(
i
)
2
x
1
(
i
)
x
2
(
i
)
⋯
2
x
1
(
i
)
x
n
(
i
)
⋮
⋮
⋮
⋱
⋮
2
x
n
(
i
)
x
0
(
i
)
2
x
n
(
i
)
x
1
(
i
)
2
x
n
(
i
)
x
2
(
i
)
⋯
2
x
n
(
i
)
x
n
(
i
)
)
\begin{pmatrix} 2x_0^{(i)}x_0^{(i)} & 2x_0^{(i)}x_1^{(i)} & 2x_0^{(i)}x_2^{(i)} & \cdots & 2x_0^{(i)}x_n^{(i)} \\ 2x_1^{(i)}x_0^{(i)} & 2x_1^{(i)}x_1^{(i)} & 2x_1^{(i)}x_2^{(i)} & \cdots & 2x_1^{(i)}x_n^{(i)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 2x_n^{(i)}x_0^{(i)} & 2x_n^{(i)}x_1^{(i)} & 2x_n^{(i)}x_2^{(i)} & \cdots & 2x_n^{(i)}x_n^{(i)} \\ \end{pmatrix}
⎝⎜⎜⎜⎜⎛2x0(i)x0(i)2x1(i)x0(i)⋮2xn(i)x0(i)2x0(i)x1(i)2x1(i)x1(i)⋮2xn(i)x1(i)2x0(i)x2(i)2x1(i)x2(i)⋮2xn(i)x2(i)⋯⋯⋱⋯2x0(i)xn(i)2x1(i)xn(i)⋮2xn(i)xn(i)⎠⎟⎟⎟⎟⎞
显然
B
i
B_i
Bi是一个半正定矩阵,所以
θ
T
H
θ
=
θ
T
(
∑
i
=
1
m
B
i
)
θ
≥
0
\theta^TH\theta=\theta^T(\sum_{i=1}^{m}B_{i})\theta\geq 0
θTHθ=θT(i=1∑mBi)θ≥0
所以
H
H
H也为一个半正定矩阵
然而在实际情况中,等号成立的条件几乎不可能满足,所以我们一般认为
H
H
H为一个正定矩阵。又由开头给出的引理,得到代价函数
J
(
θ
)
J(\theta)
J(θ)为严格凸函数,故得证。