凸优化基础知识—逼近&拟合（Approximation & Fitting）

最新推荐文章于 2021-11-24 10:01:13 发布

MadJieJie

最新推荐文章于 2021-11-24 10:01:13 发布

阅读量800

点赞数 1

分类专栏： Convex Optimization

若有帮助，请点赞&收藏，转载请标注出处。

本文链接：https://blog.csdn.net/MadJieJie/article/details/118679835

版权

Convex Optimization 专栏收录该内容

6 篇文章 14 订阅

订阅专栏

本文介绍了数值优化中的双准则问题，即寻找使得残差和解的范数都尽可能小的向量。通过引入正则化，可以将双准则问题转化为单目标优化，例如Tikhonov正则化，它通过权衡残差和解的欧几里得范数来找到平衡点。正则化也有不同的形式，如平滑正则化，用于考虑变量的连续性或光滑性。在信号重建和去噪问题中，正则化被用来在保留信号细节的同时去除噪声，如总变分正则化在保持信号突变的同时去除噪声。

摘要由CSDN通过智能技术生成

Approximation & Fitting

6.3.1 Bi-criterion formulation

In the basic form of regularized approximation, the goal is to find a vector $x$ that is small (if possible), and also makes the residual $A x - b$ small. This is naturally described as a (convex) vector optimization problem with two objectives, $\| Ax−b \|$ and $\|x\|$ :
$:\min (w.r.t. \mathbf{R}_+^2) ~ ~(\| Ax−b \|, \|x\|).$ The two norms can be different: the first, used to measure the size of the residual, is on $\mathbf{R}^m$ ; the second, used to measure the size of $x$ , is on $\mathbf{R}^n$ .

The optimal trade-off between the two objectives can be found using several methods. The optimal trade-off curve of $\| Ax−b \|$ versus $\|x\|$ , which shows how large one of the objectives must be made to have the other one small, can then be plotted. One endpoint of the optimal trade-off curve between $\| Ax−b \|$ and kxk is easy to describe. The minimum value of kxk is zero, and is achieved only when $x = 0$ . For this value of $x$ , the residual norm has the value $\|b\|$ .

The other endpoint of the trade-off curve is more complicated to describe. Let $C$ denote the set of minimizers of $\| Ax-b \|$ (with no constraint on $\|x\|$ ). Then any minimum norm point in $C$ is Pareto optimal, corresponding to the other endpoint of the trade-off curve. In other words, Pareto optimal points at this endpoint are given by minimum norm minimizers of $\|Ax-b\|$ . If both norms are Euclidean, this Pareto optima point is unique, and given by $x=A^\dagger b,$ where $A^\dagger$ is the pseudo-inverse of $A$ .

6.3.2 Regularization

Regularization is a common scalarization method used to solve the bi-criterion problem ( $P 6.7$ ). One form of regularization is to minimize the weighted sum of the objectives:
$~~\min \|Ax - b\| + \gamma \|x\|,$ where $\gamma>0$ is a problem (regularization) parameter. As $\gamma$ varies over $(0,\infty),$ the solution of (P6.8) traces out the optimal trade-off curve.

Another common method of regularization, especially when the Euclidean norm is used, is to minimize the weighted sum of squared norms, i.e.,
$\min \| Ax - b \|^2 + \delta \| x \|^2,$ for a variety of value of $\delta>0.$

These regularized approximation problems each solve the bi-criterion problem of making both $\|Ax - b\|$ and $x$ small, by adding an extra term or penalty associated with the norm of $x$ .

Interpretations

Regularization is used in several contexts. In an estimation setting, the extra term penalizing large $\|x\|$ can be interpreted as our prior knowledge that $\|x\|$ is not too large. In an optimal design setting, the extra term adds the cost of using large values of the design variables to the cost of missing the target specifications.

The constraint that $\|x\|$ be small can also reflect a modeling issue. It might be, for example, that $y = A x$ is only a good approximation of the true relationship $y = f (x)$ between $x$ and $y$ . In order to have $f (x) \approx b$ , we want $A x \approx b$ , and also need x small in order to ensure that $f (x) \approx A x$ .

Tikhonov regularization

The most common form of regularization is based on (6.9), with Euclidean norms, which results in a (convex) quadratic optimization problem:
$\min ~\|Ax-b\|_2^2 + \delta \| x\|_2^2 = x^T(A^TA+\delta I) x - 2b^TAx +b^Tb$
This Tikhonov regularization problem has the analytical solution
$(A^TA + \delta I)^{-1} A^T b.$
Since $\succ 0$ for any $δ > 0$ , the Tikhonov regularized least-squares solution requires no rank (or dimension) assumptions on the matrix A.

Smoothing regularization

The idea of regularization, i.e., adding to the objective a term that penalizes large $x$ , can be extended in several ways. In one useful extension we add a regularization term of the form $\|Dx\|$ , in place of kxk. In many applications, the matrix $D$ represents an approxi-mate differentiation or second-order differentiation operator, so $\|Dx\|$ represents a measure of the variation or smoothness of $x$ .

For example, suppose that the vector $\mathbf{R}_n$ represents the value of some continuous physical parameter, say, temperature, along the interval $[0, 1]$ : $x_i$ is the temperature at the point $i / n$ . A simple approximation of the gradient or first derivative of the parameter near i/n is given by $n(x_i+1 − x_i )$ , and a simple approximation of its second derivative is given by the second difference
$n(n(x_{i+1} - x_i) - n(x_i-x_{i-1})) = n^2(x_{i+1}-2x_i + x_{i-1}).$

If $\Delta$ is the (tridiagonal, Toeplitz) matrix
$\Delta=n^{2}\left[\begin{array}{rrrrrrrrr} 1 & -2 & 1 & 0 & \cdots & 0 & 0 & 0 & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & -2 & \cdots & 0 & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & -2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & \cdots & 1 & -2 & 1 & 0 \\ 0 & 0 & 0 & 0 & \cdots & 0 & 1 & -2 & 1 \end{array}\right] \in \mathbf{R}^{(n-2) \times n},$ then $\Delta x$ represents an approximation of the second derivative of the parameter, so $\| \Delta x \|_2^2$ represents a measure of the mean-square curvature of the parameter over the interval $[0, 1]$ .
The Tikhonov regularized problem
$\min ~ ~\|Ax-b\|_2^2 + \delta \| x\|_2^2$ can be used to trade off the objective $Ax−b\|^2$ , which might represent a measure of fit, or consistency with experimental data, and the objective $x\|^2$ , which is (approximately) the mean-square curvature of the underlying physical parameter. The parameter $δ$ is used to control the amount of regularization required, or to plot the optimal trade-off curve of fit versus smoothness.

We can also add several regularization terms. For example, we can add terms associated with smoothness ( $\|\Delta x\|$ ) and size ( $\|x\|$ ), as in
$~~\min ~ ~\|Ax-b\|_2^2 + \delta \| x\|_2^2 + \eta\|x\|_2^2.$
To be noted, the parameter $δ \geq 0$ is used to control the smoothness of the approximate solution, and the parameter $η \geq 0$ is used to control its size.

6.3.3 Reconstruction, smoothing, and de-noising

In reconstruction problems, we start with a signal represented by a vector $\mathbf{R}_n$ . It is usually assumed that the signal does not vary too rapidly, which means that usually, we have $x_i ≈ x_i+1.$ (In this section we consider signals in one dimension, e.g., audio signals, but the same ideas can be applied to signals in two or more dimensions, e.g., images or video.)

The signal $x$ is corrupted by an additive noise $v$ :
$x_{cor} = x + v.$ The goal is to form an estimate $\hat{x}$ of the original signal $x$ , given the corrupted signal $x_{cor}$ . This process is called signal reconstruction (since we are trying to reconstruct the original signal from the corrupted version) or de-noising (since we are trying to remove the noise from the corrupted signal). Most reconstruction methods end up performing some sort of smoothing operation on $x_{cor}$ to produce $\hat{x}$ , so the process is also called smoothing.

One simple formulation of the reconstruction problem is the bi-criterion problem
$\min (w.r.t.~\mathbf{R}_+^2)~~ (\|\hat{x}-x_{cor}\|_2, \phi(\hat{x})),$ where $\hat{x}$ is the variable and $x_{cor}$ is a problem parameter. The function $\phi:\mathbf{R}\rightarrow\mathbf{R}$ is convex, and is called the regularization function or smoothing objective.

Quadratic smoothing

The simplest reconstruction method uses the quadratic smoothing function
$\phi_{quad}(x) =\sum_{i=1}^{n-1}( x_{i+1} - x_i )^2 = \| Dx \|_2^2,$ where $\in \mathbf{R}^{(n-1) \times n}$ is the bidiagonal matrix
$$ $D=\left[\begin{array}{rrrrrrrrr} -1 & 1 & 0 & \cdots & 0 & 0 & 0 \\ 0 & -1 & 1 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & -1 & 1 & 0 \\ 0 & 0 & 0 & \cdots & 0 & -1 & 1 \end{array}\right].$
We can obtain the optimal trade-off between $\| \hat{x} −x_{cor} \|_2$ and $D x\|_2$ by minimizing $\| \hat{x} - x_{cor}\|_2^2 + \delta \| D \hat{x} \|_2^2 ,$ where $δ > 0$ parametrizesthe optimal trade-off curve.
The solution of this quadratic problem,
$\delta D^TD)^{-1} x_{cor}$ can be computed very efficiently since $\delta D^TD$ is tridiagonal.

Total variation reconstruction

Simple quadratic smoothing works well as a reconstruction method when the original signal is very smooth, and the noise is rapidly varying. But any rapid variations in the original signal will, obviously, be attenuated or removed by quadratic smoothing.

In this section we describe a reconstruction method that can remove much of the noise, while still preserving occasional rapid variations in the original signal. The method is based on the smoothing function
$\phi_{\mathrm{tv}}(\hat{x})=\sum_{i=1}^{n-1}\left|\hat{x}_{i+1}-\hat{x}_{i}\right|=\|D \hat{x}\|_{1},$ which is called the total variation of $\mathbf{R}_n$ . Like the quadratic smoothness measure φ quad , the total variation function assigns large values to rapidly varying $\hat{x}$ . The total variation measure, however, assigns relatively less penalty to large
values of $x_{i+1} − x_i |$ .