本篇文章是来自于一本书的某一章
讲述图像去噪,获得原图X的时候,引进一种新的方法,是可以在前期处理的过程中,引入Lipschitz去做逼近的,同时利用Taylor展式,可以把原有的目标函数变换成convex function并且带有一个penalty 项的样子 分成了两部分处理,一部分可以继续用梯度下降的方法,另一部分就单独留着再继续处理。
除此之外,在image 处理过程中,也和以前的方法不同,以前通常都是一个大的image,并且大多是square形状的,在本文的paper中,允许多样性的存在,
Compressive Sensing Methodology
For a noisy image, Compressive Sensing can be expressed as :
y
=
Φ
Ψ
s
+
ω
y = \Phi\Psi s+ \omega
y=ΦΨs+ω
~~~~~~~~~~~~~~~~~~~
(1)
ω
\omega
ω is a N-dimension noise signal
Ψ
\Psi
Ψ is a N
×
\times
×N orthogonal basis matrix
Φ
\Phi
Φ is an M
×
\times
× N random measurement matrix (M < N)
signal s in (1) can be estimated from measurement y by solving the convex minimization problem as follows.
argmin
x
∣
∣
Φ
x
−
y
∣
∣
2
2
+
λ
∣
∣
x
∣
∣
1
_x||\Phi x -y ||_2^2+\lambda||x||_1
x∣∣Φx−y∣∣22+λ∣∣x∣∣1
~~~~~~~
(2)
(2) is a constrained minimization problem of a convex function.
We can solve this problem by gradient-based method, which generate a new sequence
x
k
x_k
xk via:
x
0
∈
R
N
,
x
k
=
x
k
−
1
−
t
k
∇
g
(
x
k
−
1
)
x_0\in \mathbb{R}^N,x_k=x_{k-1}-t_k\nabla g(x_{k-1})
x0∈RN,xk=xk−1−tk∇g(xk−1)
where g(x) is a convex function, and
t
k
t_k
tk is step size.
For (2),Let us look at the objective function, it can be rewritten as F(x) = g(x) + f(x)
g(x) is a convex function, and f(x) =
λ
∣
∣
x
∣
∣
1
\lambda||x||_1
λ∣∣x∣∣1
Then the Function g(x) can be approximated by a quadratic function
g
(
x
,
x
k
)
=
g
(
x
k
−
1
)
+
<
(
x
−
x
k
−
1
)
,
∇
g
(
x
k
−
1
)
>
+
1
2
t
k
∣
∣
x
−
x
k
−
1
∣
∣
2
2
g(x,x_k)=g(x_{k-1})+<(x-x_{k-1}),\nabla g(x_{k-1})>+\frac{1}{2t_k}||x-x_{k-1}||_2^2
g(x,xk)=g(xk−1)+<(x−xk−1),∇g(xk−1)>+2tk1∣∣x−xk−1∣∣22
this function
t
k
t_k
tk can be replaced by a constant 1/L which is related to the Lipschitz constant.
Combined with other papers I have read. Applying the same idea to the non-smooth l 1 l_1 l1 norm regularized problem:
minF(x) = min {g(x) + λ ∣ ∣ x ∣ ∣ 1 \lambda||x||_1 λ∣∣x∣∣1 }
which can lead to the following iterative scheme:
x
k
=
a
r
g
m
i
n
x
g
(
x
k
−
1
)
+
<
(
x
−
x
k
−
1
)
,
∇
g
(
x
k
−
1
)
>
+
1
2
t
k
∣
∣
x
−
x
k
−
1
∣
∣
2
2
+
λ
∣
∣
x
∣
∣
1
x_k = argmin_x{g(x_{k-1})+<(x-x_{k-1}),\nabla g(x_{k-1})>+\frac{1}{2t_k}||x-x_{k-1}||_2^2+\lambda||x||_1}
xk=argminxg(xk−1)+<(x−xk−1),∇g(xk−1)>+2tk1∣∣x−xk−1∣∣22+λ∣∣x∣∣1
After the constant term is ignored, we get:
x k = a r g m i n x ( 1 2 t k ∣ ∣ x − ( x k − 1 − t k ∇ g ( x k − 1 ) ∣ ∣ 2 2 + λ ∣ ∣ x ∣ ∣ 1 ) x_k =argmin_x\left( \large \frac{1}{2t_k}||x-(x_{k-1}-t_k\nabla g(x_{k-1})||_2^2+\lambda||x||_1\large \right) xk=argminx(2tk1∣∣x−(xk−1−tk∇g(xk−1)∣∣22+λ∣∣x∣∣1)
According to the Lipschitz gradient:
∣
∣
∇
g
(
x
)
−
∇
g
(
y
)
∣
∣
≤
L
∣
∣
x
−
y
∣
∣
||\nabla g(x) -\nabla g(y)||\leq L||x-y||
∣∣∇g(x)−∇g(y)∣∣≤L∣∣x−y∣∣ for all x & y,
we know that when x is close to y, ∣ ∣ ∇ g ( x ) − ∇ g ( y ) ∣ ∣ ∣ ∣ x − y ∣ ∣ \frac{||\nabla g(x) -\nabla g(y)||}{||x-y||} ∣∣x−y∣∣∣∣∇g(x)−∇g(y)∣∣ is the approximation of g ′ ′ ( x ) g\prime\prime(x) g′′(x) at point x.
So our model and function can be approximated by:
F
(
x
)
=
g
(
x
k
−
1
)
+
<
(
x
−
x
k
−
1
)
,
∇
g
(
x
k
−
1
)
>
+
L
2
∣
∣
x
−
x
k
−
1
∣
∣
2
2
+
f
(
x
)
F(x)=g(x_{k-1})+<(x-x_{k-1}),\nabla g(x_{k-1})>+\frac{L}{2}||x-x_{k-1}||_2^2 + f(x)
F(x)=g(xk−1)+<(x−xk−1),∇g(xk−1)>+2L∣∣x−xk−1∣∣22+f(x)
x k = a r g m i n x ( L 2 ∣ ∣ x − ( x k − 1 − 1 L ∇ g ( x k − 1 ) ∣ ∣ 2 2 + λ ∣ ∣ x ∣ ∣ 1 ) x_k =argmin_x\left( \large \frac{L}{2}||x-(x_{k-1}-\frac{1}{L}\nabla g(x_{k-1})||_2^2+\lambda||x||_1\large \right) xk=argminx(2L∣∣x−(xk−1−L1∇g(xk−1)∣∣22+λ∣∣x∣∣1)
Or equivalently:
x
k
=
a
r
g
m
i
n
x
(
L
2
∣
∣
x
−
d
k
∣
∣
2
2
+
λ
∣
∣
x
∣
∣
1
)
x_k =argmin_x\left( \large \frac{L}{2}||x-d_k||_2^2+\lambda||x||_1\large \right)
xk=argminx(2L∣∣x−dk∣∣22+λ∣∣x∣∣1)
We recall the equation before, we know that:
y
=
Φ
Ψ
s
+
ω
y = \Phi\Psi s+ \omega
y=ΦΨs+ω
g(x)= ∣ ∣ Φ x − y ∣ ∣ 2 2 = ∣ ∣ Φ Ψ s − y ∣ ∣ 2 2 ||\Phi x -y ||_2 ^2=||\Phi \Psi s -y ||_2 ^2 ∣∣Φx−y∣∣22=∣∣ΦΨs−y∣∣22
d k = x k − 1 − 1 L ∇ g ( x k − 1 ) d_k = x_{k-1}-\frac{1}{L}\nabla g(x_{k-1}) dk=xk−1−L1∇g(xk−1)
d k = x k − 1 − 1 L ( Φ Ψ T ) T ( Φ Ψ T x k − 1 − y ) d_k=x_{k-1}-\frac{1}{L}(\Phi\Psi^T)^T(\Phi\Psi^Tx_{k-1}-y) dk=xk−1−L1(ΦΨT)T(ΦΨTxk−1−y)
1 L \frac{1}{L} L1 is the step size
The adaptive Block CS with sparsity
l
0
l_0
l0 = #{j,
c
j
c_j
cj=0}
l
ε
0
l_\varepsilon^0
lε0 = #{j,
c
j
≤
ε
c_j\leq\varepsilon
cj≤ε}
THE END of notes.