凸优化基础知识—逼近&拟合(Approximation & Fitting)

本文介绍了数值优化中的双准则问题,即寻找使得残差和解的范数都尽可能小的向量。通过引入正则化,可以将双准则问题转化为单目标优化,例如Tikhonov正则化,它通过权衡残差和解的欧几里得范数来找到平衡点。正则化也有不同的形式,如平滑正则化,用于考虑变量的连续性或光滑性。在信号重建和去噪问题中,正则化被用来在保留信号细节的同时去除噪声,如总变分正则化在保持信号突变的同时去除噪声。
摘要由CSDN通过智能技术生成


Approximation & Fitting

6.3.1 Bi-criterion formulation

In the basic form of regularized approximation, the goal is to find a vector x x x that is small (if possible), and also makes the residual A x − b Ax − b Axb small. This is naturally described as a (convex) vector optimization problem with two objectives, ∥ A x − b ∥ \| Ax−b \| Axb and ∥ x ∥ \|x\| x:
P 6.7 : min ⁡ ( w . r . t . R + 2 )    ( ∥ A x − b ∥ , ∥ x ∥ ) . P6.7 :\min (w.r.t. \mathbf{R}_+^2) ~ ~(\| Ax−b \|, \|x\|). P6.7:min(w.r.t.R+2)  (Axb,x). The two norms can be different: the first, used to measure the size of the residual, is on R m \mathbf{R}^m Rm; the second, used to measure the size of x x x, is on R n \mathbf{R}^n Rn .

The optimal trade-off between the two objectives can be found using several methods. The optimal trade-off curve of ∥ A x − b ∥ \| Ax−b \| Axb versus ∥ x ∥ \|x\| x, which shows how large one of the objectives must be made to have the other one small, can then be plotted. One endpoint of the optimal trade-off curve between ∥ A x − b ∥ \| Ax−b \| Axb and kxk is easy to describe. The minimum value of kxk is zero, and is achieved only when x = 0 x = 0 x=0. For this value of x x x, the residual norm has the value ∥ b ∥ \|b\| b.

The other endpoint of the trade-off curve is more complicated to describe. Let C C C denote the set of minimizers of ∥ A x − b ∥ \| Ax-b \| Axb (with no constraint on ∥ x ∥ \|x\| x). Then any minimum norm point in C C C is Pareto optimal, corresponding to the other endpoint of the trade-off curve. In other words, Pareto optimal points at this endpoint are given by minimum norm minimizers of ∥ A x − b ∥ \|Ax-b\| Axb. If both norms are Euclidean, this Pareto optima point is unique, and given by x = A † b , x=A^\dagger b, x=Ab, where A † A^\dagger A is the pseudo-inverse of A A A.


6.3.2 Regularization

Regularization is a common scalarization method used to solve the bi-criterion problem ( P 6.7 P6.7 P6.7). One form of regularization is to minimize the weighted sum of the objectives:
P 6.8    min ⁡ ∥ A x − b ∥ + γ ∥ x ∥ , P6.8 ~~\min \|Ax - b\| + \gamma \|x\|, P6.8  minAxb+γx, where γ > 0 \gamma>0 γ>0 is a problem (regularization) parameter. As γ \gamma γ varies over ( 0 , ∞ ) , (0,\infty), (0,), the solution of (P6.8) traces out the optimal trade-off curve.

Another common method of regularization, especially when the Euclidean norm is used, is to minimize the weighted sum of squared norms, i.e.,
min ⁡ ∥ A x − b ∥ 2 + δ ∥ x ∥ 2 , \min \| Ax - b \|^2 + \delta \| x \|^2, minAxb2+δx2, for a variety of value of δ > 0. \delta>0. δ>0.

These regularized approximation problems each solve the bi-criterion problem of making both ∥ A x − b ∥ \|Ax - b\| Axb and x x x small, by adding an extra term or penalty associated with the norm of x x x.

Interpretations

Regularization is used in several contexts. In an estimation setting, the extra term penalizing large ∥ x ∥ \|x\| x can be interpreted as our prior knowledge that ∥ x ∥ \|x\| x is not too large. In an optimal design setting, the extra term adds the cost of using large values of the design variables to the cost of missing the target specifications.

The constraint that ∥ x ∥ \|x\| x be small can also reflect a modeling issue. It might be, for example, that y = A x y = Ax y=Ax is only a good approximation of the true relationship y = f ( x ) y = f(x) y=f(x) between x x x and y y y. In order to have f ( x ) ≈ b f(x) ≈ b f(x)b, we want A x ≈ b Ax ≈ b Axb, and also need x small in order to ensure that f ( x ) ≈ A x f(x) ≈ Ax f(x)Ax.

Tikhonov regularization

The most common form of regularization is based on (6.9), with Euclidean norms, which results in a (convex) quadratic optimization problem:
min ⁡   ∥ A x − b ∥ 2 2 + δ ∥ x ∥ 2 2 = x T ( A T A + δ I ) x − 2 b T A x + b T b \min ~\|Ax-b\|_2^2 + \delta \| x\|_2^2 = x^T(A^TA+\delta I) x - 2b^TAx +b^Tb min Axb22+δx22=xT(ATA+δI)x2bTAx+bTb
This Tikhonov regularization problem has the analytical solution
x = ( A T A + δ I ) − 1 A T b . x = (A^TA + \delta I)^{-1} A^T b. x=(ATA+δI)1ATb.
Since A T A + δ I ≻ 0 A T A+δI \succ 0 ATA+δI0 for any δ > 0 δ > 0 δ>0, the Tikhonov regularized least-squares solution requires no rank (or dimension) assumptions on the matrix A.

Smoothing regularization

The idea of regularization, i.e., adding to the objective a term that penalizes large x x x, can be extended in several ways. In one useful extension we add a regularization term of the form ∥ D x ∥ \|Dx\| Dx, in place of kxk. In many applications, the matrix D D D represents an approxi-mate differentiation or second-order differentiation operator, so ∥ D x ∥ \|Dx\| Dx represents a measure of the variation or smoothness of x x x.

For example, suppose that the vector x ∈ R n x ∈ \mathbf{R}_n xRn represents the value of some continuous physical parameter, say, temperature, along the interval [ 0 , 1 ] [0,1] [0,1]:   x i ~x_i  xi is the temperature at the point i / n i/n i/n. A simple approximation of the gradient or first derivative of the parameter near i/n is given by n ( x i + 1 − x i ) n(x_i+1 − x_i ) n(xi+1xi), and a simple approximation of its second derivative is given by the second difference
n ( n ( x i + 1 − x i ) − n ( x i − x i − 1 ) ) = n 2 ( x i + 1 − 2 x i + x i − 1 ) . n(n(x_{i+1} - x_i) - n(x_i-x_{i-1})) = n^2(x_{i+1}-2x_i + x_{i-1}). n(n(xi+1xi)n(xixi1))=n2(xi+12xi+xi1).

If Δ \Delta Δ is the (tridiagonal, Toeplitz) matrix
Δ = n 2 [ 1 − 2 1 0 ⋯ 0 0 0 0 0 1 − 2 1 ⋯ 0 0 0 0 0 0 1 − 2 ⋯ 0 0 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 0 0 0 ⋯ − 2 1 0 0 0 0 0 0 ⋯ 1 − 2 1 0 0 0 0 0 ⋯ 0 1 − 2 1 ] ∈ R ( n − 2 ) × n , \Delta=n^{2}\left[\begin{array}{rrrrrrrrr} 1 & -2 & 1 & 0 & \cdots & 0 & 0 & 0 & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & -2 & \cdots & 0 & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & -2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & \cdots & 1 & -2 & 1 & 0 \\ 0 & 0 & 0 & 0 & \cdots & 0 & 1 & -2 & 1 \end{array}\right] \in \mathbf{R}^{(n-2) \times n}, Δ=n2100000210000121000012000000210000121000012000001R(n2)×n, then Δ x \Delta x Δx represents an approximation of the second derivative of the parameter, so ∥ Δ x ∥ 2 2 \| \Delta x \|_2^2 Δx22 represents a measure of the mean-square curvature of the parameter over the interval [ 0 , 1 ] [0,1] [0,1].
The Tikhonov regularized problem
min ⁡    ∥ A x − b ∥ 2 2 + δ ∥ x ∥ 2 2 \min ~ ~\|Ax-b\|_2^2 + \delta \| x\|_2^2 min  Axb22+δx22 can be used to trade off the objective ∥ A x − b ∥ 2 \|Ax−b\|^2 Axb2, which might represent a measure of fit, or consistency with experimental data, and the objective ∥ ∆ x ∥ 2 \|∆x\|^2 x2 , which is (approximately) the mean-square curvature of the underlying physical parameter. The parameter δ δ δ is used to control the amount of regularization required, or to plot the optimal trade-off curve of fit versus smoothness.

We can also add several regularization terms. For example, we can add terms associated with smoothness ( ∥ Δ x ∥ \|\Delta x\| Δx) and size ( ∥ x ∥ \|x\| x), as in
P 6.9 :    min ⁡    ∥ A x − b ∥ 2 2 + δ ∥ x ∥ 2 2 + η ∥ x ∥ 2 2 . P6.9: ~~\min ~ ~\|Ax-b\|_2^2 + \delta \| x\|_2^2 + \eta\|x\|_2^2. P6.9:  min  Axb22+δx22+ηx22.
To be noted, the parameter δ ≥ 0 δ ≥ 0 δ0 is used to control the smoothness of the approximate solution, and the parameter η ≥ 0 η ≥ 0 η0 is used to control its size.


6.3.3 Reconstruction, smoothing, and de-noising

In reconstruction problems, we start with a signal represented by a vector x ∈ R n x ∈ \mathbf{R}_n xRn. It is usually assumed that the signal does not vary too rapidly, which means that usually, we have x i ≈ x i + 1. x_i ≈ x_i+1. xixi+1. (In this section we consider signals in one dimension, e.g., audio signals, but the same ideas can be applied to signals in two or more dimensions, e.g., images or video.)

The signal x x x is corrupted by an additive noise v v v:
x c o r = x + v . x_{cor} = x + v. xcor=x+v. The goal is to form an estimate x ^ \hat{x} x^ of the original signal x x x, given the corrupted signal x c o r x_{cor} xcor. This process is called signal reconstruction (since we are trying to reconstruct the original signal from the corrupted version) or de-noising (since we are trying to remove the noise from the corrupted signal). Most reconstruction methods end up performing some sort of smoothing operation on x c o r x_{cor} xcor to produce x ^ \hat{x} x^, so the process is also called smoothing.

One simple formulation of the reconstruction problem is the bi-criterion problem
min ⁡ ( w . r . t .   R + 2 )    ( ∥ x ^ − x c o r ∥ 2 , ϕ ( x ^ ) ) , \min (w.r.t.~\mathbf{R}_+^2)~~ (\|\hat{x}-x_{cor}\|_2, \phi(\hat{x})), min(w.r.t. R+2)  (x^xcor2,ϕ(x^)), where x ^ \hat{x} x^ is the variable and x c o r x_{cor} xcor is a problem parameter. The function ϕ : R → R \phi:\mathbf{R}\rightarrow\mathbf{R} ϕ:RR is convex, and is called the regularization function or smoothing objective.

Quadratic smoothing

The simplest reconstruction method uses the quadratic smoothing function
ϕ q u a d ( x ) = ∑ i = 1 n − 1 ( x i + 1 − x i ) 2 = ∥ D x ∥ 2 2 , \phi_{quad}(x) =\sum_{i=1}^{n-1}( x_{i+1} - x_i )^2 = \| Dx \|_2^2, ϕquad(x)=i=1n1(xi+1xi)2=Dx22, where D ∈ R ( n − 1 ) × n D \in \mathbf{R}^{(n-1) \times n} DR(n1)×n is the bidiagonal matrix
$$ D = [ − 1 1 0 ⋯ 0 0 0 0 − 1 1 ⋯ 0 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 0 0 ⋯ − 1 1 0 0 0 0 ⋯ 0 − 1 1 ] . D=\left[\begin{array}{rrrrrrrrr} -1 & 1 & 0 & \cdots & 0 & 0 & 0 \\ 0 & -1 & 1 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & -1 & 1 & 0 \\ 0 & 0 & 0 & \cdots & 0 & -1 & 1 \end{array}\right]. D=100011000100001000110001.
We can obtain the optimal trade-off between ∥ x ^ − x c o r ∥ 2 \| \hat{x} −x_{cor} \|_2 x^xcor2 and ∥ D x ∥ 2 \| D x\|_2 Dx2 by minimizing ∥ x ^ − x c o r ∥ 2 2 + δ ∥ D x ^ ∥ 2 2 , \| \hat{x} - x_{cor}\|_2^2 + \delta \| D \hat{x} \|_2^2 , x^xcor22+δDx^22, where δ > 0 δ > 0 δ>0 parametrizesthe optimal trade-off curve.
The solution of this quadratic problem,
( I + δ D T D ) − 1 x c o r (I + \delta D^TD)^{-1} x_{cor} (I+δDTD)1xcor can be computed very efficiently since I + δ D T D I + \delta D^TD I+δDTD is tridiagonal.

Total variation reconstruction

Simple quadratic smoothing works well as a reconstruction method when the original signal is very smooth, and the noise is rapidly varying. But any rapid variations in the original signal will, obviously, be attenuated or removed by quadratic smoothing.

In this section we describe a reconstruction method that can remove much of the noise, while still preserving occasional rapid variations in the original signal. The method is based on the smoothing function
ϕ t v ( x ^ ) = ∑ i = 1 n − 1 ∣ x ^ i + 1 − x ^ i ∣ = ∥ D x ^ ∥ 1 , \phi_{\mathrm{tv}}(\hat{x})=\sum_{i=1}^{n-1}\left|\hat{x}_{i+1}-\hat{x}_{i}\right|=\|D \hat{x}\|_{1}, ϕtv(x^)=i=1n1x^i+1x^i=Dx^1, which is called the total variation of x ∈ R n x ∈ \mathbf{R}_n xRn. Like the quadratic smoothness measure φ quad , the total variation function assigns large values to rapidly varying x ^ \hat{x} x^. The total variation measure, however, assigns relatively less penalty to large
values of ∣ x i + 1 − x i ∣ |x_{i+1} − x_i | xi+1xi.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值