Logistic Regression(逻辑回归)是机器学习中的经典任务,表示为下面一个优化问题:
min
w
f
(
w
)
\min_w f(w)
wminf(w)
其中,
f
(
w
)
=
λ
2
∥
w
∥
2
+
1
n
∑
i
=
1
n
ln
(
1
+
e
−
y
i
x
i
T
w
)
=
λ
2
∥
w
∥
2
+
1
n
∑
i
=
1
n
f
i
(
w
)
f
i
(
w
)
=
ln
(
1
+
e
−
y
i
x
i
T
w
)
∇
f
i
(
w
)
=
−
e
−
y
i
x
i
T
w
1
+
e
−
y
i
x
i
T
w
⋅
y
i
x
i
,
\begin{aligned} f(w)&= \frac{\lambda}{2}\|w\|^2+\frac{1}{n}\sum_{i=1}^{n}\ln(1+e^{-y_ix_i^Tw})\\ &= \frac{\lambda}{2}\|w\|^2+\frac{1}{n}\sum_{i=1}^{n}f_i(w)\\ f_i(w)&=\ln(1+e^{-y_ix_i^Tw}) \\ \nabla f_i(w)&=-\frac{e^{-y_ix_i^Tw}}{1+e^{-y_ix_i^Tw}}\cdot y_ix_i, \\ \end{aligned}
f(w)fi(w)∇fi(w)=2λ∥w∥2+n1i=1∑nln(1+e−yixiTw)=2λ∥w∥2+n1i=1∑nfi(w)=ln(1+e−yixiTw)=−1+e−yixiTwe−yixiTw⋅yixi,
x
i
x_i
xi,
y
i
y_i
yi为第
i
i
i个数据的特征和标签
一个一阶可导函数
f
i
f_i
fi具有
L
L
L-Lipschitz连续梯度(
L
L
L-光滑)是指存在常数
L
L
L,对任意
a
,
b
∈
d
o
m
(
f
)
a,b\in \mathop{dom}(f)
a,b∈dom(f),满足
∥
∇
f
i
(
a
)
−
∇
f
i
(
b
)
∥
≤
L
∥
a
−
b
∥
\|\nabla f_i(a)-\nabla f_i(b)\|\le L\|a-b\|
∥∇fi(a)−∇fi(b)∥≤L∥a−b∥
在LR问题中,
f
(
w
)
f(w)
f(w)拥有
L
L
L-Lipchitz连续梯度,且
L
=
λ
+
1
4
n
∥
X
∥
F
2
L=\lambda+\frac{1}{4n}\|X\|^2_F
L=λ+4n1∥X∥F2,其中
X
=
[
x
1
;
x
2
;
⋯
;
x
n
]
X=[x_1; x_2; \cdots; x_n]
X=[x1;x2;⋯;xn]是数据矩阵,
∥
⋅
∥
F
\|\cdot\|_F
∥⋅∥F是矩阵的Frobenius范数。
证明:
∥ ∇ f i ( a ) − ∇ f i ( b ) ∥ = ∥ e − y i x i T a 1 + e − y i x i T a − e − y i x i T b 1 + e − y i x i T b ∥ ⋅ ∥ x i ∥ = ∥ σ ( y i x i T a ) − σ ( y i x i T b ) ∥ ⋅ ∥ x i ∥ ≤ 1 4 ∥ y i x i T ( a − b ) ∥ ⋅ ∥ x i ∥ = 1 4 ∥ x i T ( a − b ) ∥ ⋅ ∥ x i ∥ ≤ 1 4 ∥ x i ∥ ⋅ ∥ a − b ∥ ⋅ ∥ x i ∥ = 1 4 ∥ x i ∥ 2 ⋅ ∥ a − b ∥ , \begin{aligned} \|\nabla f_i(a)-\nabla f_i(b)\| &=\left\|\frac{e^{-y_ix_i^Ta}}{1+e^{-y_ix_i^Ta}}-\frac{e^{-y_ix_i^Tb}}{1+e^{-y_ix_i^Tb}}\right\|\cdot \|x_i\| \\ &=\left\|\sigma(y_ix_i^Ta)-\sigma(y_ix_i^Tb)\right\|\cdot \|x_i\| \\ &\le \frac{1}{4}\|y_ix_i^T(a-b)\|\cdot \|x_i\| \\ &= \frac{1}{4}\|x_i^T(a-b)\|\cdot \|x_i\| \\ &\le \frac{1}{4}\|x_i\|\cdot\|a-b\|\cdot \|x_i\| \\ &= \frac{1}{4}\|x_i\|^2\cdot\|a-b\|, \end{aligned} ∥∇fi(a)−∇fi(b)∥= 1+e−yixiTae−yixiTa−1+e−yixiTbe−yixiTb ⋅∥xi∥= σ(yixiTa)−σ(yixiTb) ⋅∥xi∥≤41∥yixiT(a−b)∥⋅∥xi∥=41∥xiT(a−b)∥⋅∥xi∥≤41∥xi∥⋅∥a−b∥⋅∥xi∥=41∥xi∥2⋅∥a−b∥,
其中sigmoid函数定义为 σ ( z ) = 1 1 + e − z \sigma(z)=\frac{1}{1+e^{-z}} σ(z)=1+e−z1,其满足Lipschitz连续性,Lipschitz常数为 1 4 \frac{1}{4} 41。
取 L i = 1 4 ∥ x i ∥ 2 L_i=\frac{1}{4}\|x_i\|^2 Li=41∥xi∥2,则 ∥ ∇ f i ( a ) − ∇ f i ( b ) ∥ ≤ L i ∥ a − b ∥ \|\nabla f_i(a)-\nabla f_i(b)\|\le L_i\|a-b\| ∥∇fi(a)−∇fi(b)∥≤Li∥a−b∥,且
L = λ + 1 n ∑ i = 1 n L i = λ + 1 4 n ∑ i = 1 n ∥ x i ∥ 2 = λ + 1 4 n ∥ X ∥ F 2 , L=\lambda+\frac{1}{n}\sum_{i=1}^{n}L_i=\lambda+\frac{1}{4n}\sum_{i=1}^{n}\|x_i\|^2=\lambda+\frac{1}{4n}\|X\|^2_F, L=λ+n1i=1∑nLi=λ+4n1i=1∑n∥xi∥2=λ+4n1∥X∥F2,
则 ∥ ∇ f ′ ( a ) − ∇ f ′ ( b ) ∥ ≤ L ∥ a − b ∥ \|\nabla f'(a)-\nabla f'(b)\|\le L\|a-b\| ∥∇f′(a)−∇f′(b)∥≤L∥a−b∥
下面证明一个性质:
σ
(
z
)
=
1
1
+
e
−
z
\sigma(z)=\frac{1}{1+e^{-z}}
σ(z)=1+e−z1满足Lipschitz连续性,即
∣
σ
(
z
)
−
σ
(
z
′
)
∣
≤
1
4
∣
z
−
z
′
∣
|\sigma(z)-\sigma(z')|\le \frac{1}{4}|z-z'|
∣σ(z)−σ(z′)∣≤41∣z−z′∣
证明:连续函数一阶导数的绝对值上界就是一个Lipschitz常数。
我们有 0 < σ ( x ) < 1 0<\sigma(x)<1 0<σ(x)<1。则
σ ′ ( x ) = h ( x ) ( 1 − h ( x ) ) ∣ σ ′ ( x ) ∣ ≤ 1 4 \begin{align} \sigma'(x)=&h(x)(1-h(x)) \\ %h''(x)=&h(x)-3h(x)^2+2h(x)^3 \\ %=&h(x)(h(x)-1)(2h(x)-1) |\sigma'(x)|\le&\frac{1}{4} \end{align} σ′(x)=∣σ′(x)∣≤h(x)(1−h(x))41