二元函数凹凸性判断
二元函数凹凸性判断:
设
f
(
x
,
y
)
f(x,y)
f(x,y)在区域
D
D
D上具有二阶连续偏导数,假定
(
x
0
,
y
0
)
(x_0,y_0)
(x0,y0)为一个驻点,且分别记为:
A
=
f
x
x
′
′
(
x
0
,
y
0
)
,
B
=
f
x
y
′
′
(
x
0
,
y
0
)
,
C
=
f
y
y
′
′
(
x
0
,
y
0
)
A=f_{xx}^{''}(x_0,y_0),B=f_{xy}^{''}(x_0,y_0),C=f_{yy}^{''}(x_0,y_0)
A=fxx′′(x0,y0),B=fxy′′(x0,y0),C=fyy′′(x0,y0)则:
(
1
)
在
D
上
恒
有
A
>
0
,
且
A
C
−
B
2
≥
0
⟹
凸
函
数
\qquad{(1)在D上恒有A>0,且AC-B^2\geq0\Longrightarrow凸函数}
(1)在D上恒有A>0,且AC−B2≥0⟹凸函数
(
2
)
在
D
上
恒
有
A
<
0
,
且
A
C
−
B
2
≥
0
⟹
凹
函
数
\qquad{(2)在D上恒有A<0,且AC-B^2\geq0\Longrightarrow凹函数}
(2)在D上恒有A<0,且AC−B2≥0⟹凹函数
注:这里的凸函数是指下凸,也就是我们常见的“凹函数”,只不过在机器学习中用这种叫法,毕竟是外国人发明的东西。
二元凹凸函数求最值:
设
f
(
x
,
y
)
f(x,y)
f(x,y)是在开区域
D
D
D内具有连续偏导数的凸(或者凹)函数,其中
(
x
0
,
y
0
)
∈
D
(x_0,y_0)\in{D}
(x0,y0)∈D,且
f
x
′
(
x
0
,
y
0
)
=
0
,
f
y
′
(
x
0
,
y
0
)
=
0
f_{x}^{'}(x_0,y_0)=0,f_{y}^{'}(x_0,y_0)=0
fx′(x0,y0)=0,fy′(x0,y0)=0,则
f
(
x
0
,
y
0
)
f(x_0,y_0)
f(x0,y0)必定为
f
(
x
,
y
)
f(x,y)
f(x,y)在区域
D
D
D内的最小值(或者最大值)。
这里已知函数为:
E
(
w
,
b
)
=
∑
i
=
1
m
(
y
i
−
w
x
i
−
b
)
2
(式1)
E(w,b)=\sum_{i=1}^{m}(y_i-wx_i-b)^2\tag{式1}
E(w,b)=i=1∑m(yi−wxi−b)2(式1)
将
E
(
w
,
b
)
E(w,b)
E(w,b)分别对于
w
,
b
w,b
w,b求导数(偏导数),得到:
∂
E
(
w
,
b
)
∂
w
=
2
(
w
⋅
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
(
y
i
−
b
)
x
i
)
(式2)
\cfrac{\partial{E(w,b)}}{\partial{w}}=2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)\tag{式2}
∂w∂E(w,b)=2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi)(式2)
∂
E
(
w
,
b
)
∂
b
=
2
(
m
b
−
∑
i
=
1
m
(
y
i
−
w
x
i
)
)
(式3)
\cfrac{\partial{E(w,b)}}{\partial{b}}=2(mb-\sum_{i=1}^{m}(y_i-wx_i))\tag{式3}
∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))(式3)
在(式2)基础上:
∂
2
E
(
w
,
b
)
∂
w
2
=
∂
∂
w
(
∂
E
(
w
,
b
)
∂
w
)
=
∂
∂
w
(
2
(
w
⋅
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
(
y
i
−
b
)
x
i
)
)
\cfrac{\partial^{2}E(w,b)}{\partial{w^2}}=\cfrac{\partial}{\partial{w}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{w}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i))
∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
=
∂
∂
w
(
2
w
⋅
∑
i
=
1
m
x
i
2
)
=
2
∑
i
=
1
m
x
i
2
(式4)
=\cfrac{\partial}{\partial{w}}(2w\cdot{\sum_{i=1}^{m}x_i^2})=2\sum_{i=1}^{m}x_i^2\tag{式4}
=∂w∂(2w⋅i=1∑mxi2)=2i=1∑mxi2(式4)
⟹
A
=
f
x
x
′
′
(
x
,
y
)
=
2
∑
i
=
1
m
x
i
2
\Longrightarrow{A=f_{xx}^{''}(x,y)}=2\sum_{i=1}^{m}x_i^2
⟹A=fxx′′(x,y)=2∑i=1mxi2
∂
E
2
(
w
,
b
)
∂
w
∂
b
=
∂
∂
b
(
∂
E
(
w
,
b
)
∂
w
)
=
∂
∂
b
(
2
(
w
⋅
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
(
y
i
−
b
)
x
i
)
)
\cfrac{\partial{E^2(w,b)}}{\partial{w}\partial{b}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{b}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i))
∂w∂b∂E2(w,b)=∂b∂(∂w∂E(w,b))=∂b∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
=
∂
∂
b
(
−
2
∑
i
=
1
m
(
y
i
−
b
)
x
i
)
=
2
∑
i
=
1
m
x
i
(式5)
=\cfrac{\partial}{\partial{b}}(-2\sum_{i=1}^{m}(y_i-b)x_i)=2\sum_{i=1}^{m}x_i\tag{式5}
=∂b∂(−2i=1∑m(yi−b)xi)=2i=1∑mxi(式5)
⟹
B
=
f
x
y
′
′
(
x
,
y
)
=
2
∑
i
=
1
m
x
i
\Longrightarrow{B=f_{xy}^{''}(x,y)}=2\sum_{i=1}^{m}x_i
⟹B=fxy′′(x,y)=2∑i=1mxi
在(式3)基础上:
∂
2
E
(
w
,
b
)
∂
b
2
=
∂
∂
b
(
∂
E
(
w
,
b
)
∂
b
)
=
∂
∂
b
(
2
(
m
b
−
∑
i
=
1
m
(
y
i
−
w
x
i
)
)
)
=
2
m
(式6)
\cfrac{\partial^2E{(w,b)}}{\partial{b^2}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{b}})=\cfrac{\partial}{\partial{b}}(2(mb-\sum_{i=1}^{m}(y_i-wx_i))) =2m\tag{式6}
∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂(2(mb−i=1∑m(yi−wxi)))=2m(式6)
⟹
C
=
f
y
y
′
′
(
x
,
y
)
=
2
m
\Longrightarrow{C=f_{yy}^{''}(x,y)}=2m
⟹C=fyy′′(x,y)=2m
A
C
−
B
2
=
4
m
∑
i
=
1
m
x
i
2
−
[
2
∑
i
=
1
m
x
i
2
]
2
=
4
m
∑
i
=
1
m
x
i
2
−
4
m
1
m
∑
i
=
1
m
x
i
⋅
∑
i
=
1
m
x
i
=
4
m
(
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
x
i
x
ˉ
)
AC-B^2=4m\sum_{i=1}^{m}x_i^2-[2\sum_{i=1}^{m}x_i^2]^2=4m\sum_{i=1}^{m}x_i^2-4m\cfrac{1}{m}\sum_{i=1}^{m}x_i\cdot\sum_{i=1}^{m}x_i=4m(\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}x_i{\bar{x}})
AC−B2=4mi=1∑mxi2−[2i=1∑mxi2]2=4mi=1∑mxi2−4mm1i=1∑mxi⋅i=1∑mxi=4m(i=1∑mxi2−i=1∑mxixˉ)
4
m
∑
i
=
1
m
(
x
i
2
−
x
i
x
ˉ
−
x
i
x
ˉ
+
x
i
x
ˉ
)
=
4
m
∑
i
=
1
m
(
x
i
2
−
2
x
i
x
ˉ
+
x
ˉ
2
)
≥
0
(式7)
4m\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+x_i\bar{x})=4m\sum_{i=1}^{m}(x_i^2-2x_i\bar{x}+\bar{x}^2)\geq0\tag{式7}
4mi=1∑m(xi2−xixˉ−xixˉ+xixˉ)=4mi=1∑m(xi2−2xixˉ+xˉ2)≥0(式7)
注:上式中进行的一个替换操作为:
∑
i
=
1
m
x
i
x
ˉ
=
x
ˉ
⋅
m
⋅
1
m
∑
i
=
1
m
x
i
=
m
x
ˉ
2
=
∑
i
=
1
m
x
ˉ
2
\sum_{i=1}^{m}x_i\bar{x}=\bar{x}\cdot{m}\cdot\cfrac{1}{m}\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2
∑i=1mxixˉ=xˉ⋅m⋅m1∑i=1mxi=mxˉ2=∑i=1mxˉ2
以及:
1
m
∑
i
=
1
m
x
i
=
x
ˉ
\cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x}
m1∑i=1mxi=xˉ。
到这里就证明了
E
(
w
,
b
)
E(w,b)
E(w,b)为凸函数,所以就可以进行凸优化操作了。