Convex Optimization 读书笔记 (5)

Chapter 6: Approximation and fitting

6.1 Norm approximation

6.1.1 Basic norm approximation problem

The simplest norm approximation problem is an unconstrained problem of the form
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ {\rm minimize} \ \ \ \ ||Ax-b|| minimize    Axbwhere A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| ARm×n,bRm,xRn, is a norm on R m \mathbf{R}^m Rm. A solution of the norm approximation problem is sometimes called an approximate solution of A x ≈ b Ax ≈ b Axb, in the norm ∥ ⋅ ∥ ∥ · ∥ . The vector
r = A x − b r=Ax-b r=Axbis called the residual for the problem; its components are sometimes called the individual residuals associated with x x x.

6.1.2 Penalty function approximation

In l p l_p lp-norm approximation, for 1 ≤ p < ∞ 1 ≤ p < ∞ 1p<, the objective is
( ∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p ) 1 p (|r_1|^p+\cdots+|r_m|^p)^{\frac{1}{p}} (r1p++rmp)p1As in least-squares problems, we can consider the equivalent problem with objective
∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p |r_1|^p+\cdots+|r_m|^p r1p++rmpThe penalty function approximation problem has the form
m i n i m i z e      ϕ ( r 1 ) + ⋯ + ϕ ( r m ) s u b j e c t   t o      r = A x − b \begin{aligned} {\rm minimize} \ \ \ \ & \phi(r_1)+\cdots+\phi(r_m)\\ {\rm subject \ to} \ \ \ \ & r=Ax-b \\ \end{aligned} minimize    subject to    ϕ(r1)++ϕ(rm)r=Axbwhere ϕ : R → R \phi : \mathbf{R} → \mathbf{R} ϕ:RR is called the (residual) penalty function. We assume that ϕ \phi ϕ is convex, so the penalty function approximation problem is a convex optimization problem.

deadzone-linear

ϕ ( u ) = { 0 ∣ u ∣ ≤ a ∣ u ∣ − a ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & 0 & |u|\leq a \\ & |u|-a & |u| > a \end{array} \right. ϕ(u)={0uauau>a

log barrier

ϕ ( u ) = { − a log ⁡ ( 1 − ( u a ) 2 ) ∣ u ∣ ≤ a ∞ ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & -a\log(1-(\frac{u}{a})^2) & |u|\leq a \\ & \infty & |u| > a \end{array} \right. ϕ(u)={alog(1(au)2)uau>a

The log barrier penalty puts weight very much like the l 2 l_2 l2-norm penalty for small residuals, but puts very strong weight on residuals larger than around 0.8, and infinite weight on residuals larger than 1.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ez4wQbTL-1604582576416)(/Users/apple/Library/Application Support/typora-user-images/image-20201105154032988.png)]

6.1.3 Approximation with constraints

It is possible to add constraints to the basic norm approximation problem. When these constraints are convex, the resulting problem is convex.

6.2 Least-norm problems

The basic least-norm problem has the form
m i n i m i z e      ∣ ∣ x ∣ ∣ s u b j e c t   t o      A x = b \begin{aligned} {\rm minimize} \ \ \ \ & ||x|| \\ {\rm subject \ to} \ \ \ \ & Ax=b \\ \end{aligned} minimize    subject to    xAx=bwhere A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| ARm×n,bRm,xRn, is a norm on R m \mathbf{R}^m Rm.

6.3 Regularized approximation

6.3.1 Bi-criterion formulation

In the basic form of regularized approximation, the goal is to find a vector x x x that is small (if possible), and also makes the residual A x − b Ax − b Axb small. This is naturally described as a (convex) vector optimization problem with two objectives, ∥ A x − b ∥ ∥Ax − b∥ Axb and ∥ x ∥ ∥x∥ x:
m i n i m i z e   ( w . r . t .   R + 2 )      ( ∣ ∣ A x − b ∣ ∣ , ∣ ∣ x ∣ ∣ ) \begin{aligned} {\rm minimize \ (w.r.t. \ } \mathbf{R}^2_+) \ \ \ \ & (||Ax-b||,||x||) \\ \end{aligned} minimize (w.r.t. R+2)    (Axb,x)
The two norms can be different: the first, used to measure the size of the residual, is on R m \mathbf{R}^m Rm; the second, used to measure the size of x x x, is on R n \mathbf{R}^n Rn.

6.3.2 Regularization

Regularization is a common scalarization method used to solve the bi-criterion problem. One form of regularization is to minimize the weighted sum of the objectives:
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ + γ ∣ ∣ x ∣ ∣ \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||+\gamma||x|| \\ \end{aligned} minimize    Axb+γxwhere γ > 0 γ > 0 γ>0 is a problem parameter. As γ γ γ varies over ( 0 , ∞ ) (0, ∞) (0,), the solution of traces out the optimal trade-off curve.

Another common method of regularization, especially when the Euclidean norm is used, is to minimize the weighted sum of squared norms:
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ 2 + δ ∣ ∣ x ∣ ∣ 2 \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||^2+\delta||x||^2 \\ \end{aligned} minimize    Axb2+δx2for a variety of values of δ > 0 δ > 0 δ>0.

6.3.3 Reconstruction, smoothing, and de-noising

Quadratic smoothing

The simplest reconstruction method uses the quadratic smoothing function
ϕ q u a d ( x ) = ∑ i = 1 n − 1 ( x i − 1 − x i ) 2 \phi_{\rm quad}(x)=\sum_{i=1}^{n-1}(x_{i-1}-x_i)^2 ϕquad(x)=i=1n1(xi1xi)2
We can obtain the optimal trade-off between ∥ x ^ − x c o r ∥ 2 ∥\hat{x}−x_{\rm cor}∥^2 x^xcor2 and ∥ D x ^ ∥ 2 ∥D\hat{x}∥^2 Dx^2 by minimizing
∣ ∣ x ^ − x c o r ∣ ∣ 2 2 + δ ∣ ∣ D x ^ ∣ ∣ 2 2 ||\hat{x}-x_{\rm cor}||^2_2+\delta||D\hat{x}||_2^2 x^xcor22+δDx^22
where D ∈ R ( n − 1 ) × n D\in \mathbf{R}^{(n-1)\times n} DR(n1)×n is the bidiagonal matrix.

But any rapid variations in the original signal will, obviously, be attenuated or removed by quadratic smoothing.

Total variation reconstruction

The method is based on the smoothing function
ϕ t v ( x ) = ∑ i = 1 n − 1 ∣ x i − 1 − x i ∣ \phi_{\rm tv}(x)=\sum_{i=1}^{n-1}|x_{i-1}-x_i| ϕtv(x)=i=1n1xi1xi

6.4 Robust approximation

6.4.1 Stochastic robust approximation

We assume that A A A is a random variable taking values in R m × n \mathbf{R}^{m\times n} Rm×n with mean A ˉ \bar{A} Aˉ so we can describe A A A as with mean A ˉ \bar{A} Aˉ as
A = A ˉ + U A=\bar{A}+U A=Aˉ+Uwhere U U U is a random matrix with zero mean. Then the problem is
m i n i m i z e      E [ ∣ ∣ A x − b ∣ ∣ 2 2 ] {\rm minimize} \ \ \ \ \mathbb{E}[||Ax-b||_2^2] minimize    E[Axb22]
We can express the objective as
m i n i m i z e      ∣ ∣ A ˉ x − b ∣ ∣ 2 2 + ∣ ∣ P 1 2 x ∣ ∣ 2 2 {\rm minimize} \ \ \ \ ||\bar{A}x-b||_2^2+||P^{\frac{1}{2}}x||_2^2 minimize    Aˉxb22+P21x22where P = E [ U T U ] P=\mathbb{E}[U^TU] P=E[UTU]. The solution is
x = ( A ˉ T A ˉ + P ) − 1 A ˉ T b x=(\bar{A}^T\bar{A}+P)^{-1}\bar{A}^Tb x=(AˉTAˉ+P)1AˉTb

6.4.2 Worst-case robust approximation

The (worst-case) robust approximation problem is to minimize the worst-case error:
m i n i m i z e      e w c ( x ) = sup ⁡ { ∣ ∣ A x − b ∣ ∣ ∣ A ∈ A } {\rm minimize} \ \ \ \ e_{\rm wc}(x)=\sup\{ ||Ax-b|| \mid A\in \mathcal{A} \} minimize    ewc(x)=sup{AxbAA}where A \mathcal{A} A is a set of all possible value. The robust approximation problem is always a convex optimization problem, but its tractability depends on the norm used and the description of the uncertainty set A \mathcal{A} A.

6.5 Function fitting and interpolation

6.5.1 Function families

We consider a family of functions f 1 , ⋯   , f n : R k → R f_1, \cdots , f_n : \mathbf{R}^k → \mathbf{R} f1,,fn:RkR, with common domain d o m   f i = D \mathbf{dom} \ f_i = D dom fi=D. With each x ∈ R n x ∈ \mathbf{R}^n xRn we associate the function $ f : \mathbf{R}^k → \mathbf{R} $ given by
f ( u ) = x 1 f 1 ( u ) + ⋯ + x n f n ( x ) f(u)=x_1f_1(u)+\cdots+x_nf_n(x) f(u)=x1f1(u)++xnfn(x)
The family { f 1 , ⋯   , f n } \{f_1, \cdots , f_n\} {f1,,fn} is sometimes called the set of basis functions (for the fitting problem) even when the functions are not independent. The vector x ∈ R n x ∈ \mathbf{R}^n xRn, which parametrizes the subspace of functions, is our optimization variable, and is sometimes called the coefficient vector.

6.5.2 Constraints

Interpolation and inequalities

Interpolation conditions
f ( v j ) = z j ,    j = 1 , . . . m f(v_j)=z_j, \ \ j=1,...m f(vj)=zj,  j=1,...mwhich require the function f f f to have the values z j ∈ R z_j ∈ \mathbf{R} zjR at specified points v j ∈ D v_j ∈ D vjD, form a set of linear equalities in x x x.

Derivative constrains

The gradient
∇ f ( v ) = ∑ i = 1 n x i ∇ f i ( v ) \nabla f(v)=\sum_{i=1}^{n}x_i\nabla f_i(v) f(v)=i=1nxifi(v)is a linear function of x x x, so interpolation conditions on the derivative of f at v v v reduce to linear equality constraints on x x x:
∣ ∣ ∇ f ( v ) ∣ ∣ = ∣ ∣ ∑ i = 1 n x i ∇ f i ( v ) ∣ ∣ ≤ M ||\nabla f(v)||=||\sum_{i=1}^{n}x_i\nabla f_i(v)||\leq M f(v)=i=1nxifi(v)M
is a convex constrain on x x x.

Integral constrains

Any linear functional L \mathcal{L} L on the subspace of functions can be expressed as a linear function of x x x:
L = c T x \mathcal{L}=c^Tx L=cTxwhere c i = ∫ D ϕ ( u ) f i ( i ) d u c_i=\int_{D}\phi(u)f_i(i)du ci=Dϕ(u)fi(i)du

6.5.3 Fitting and interpolation problems

6.5.4 Sparse descriptions and basis pursuit

In basis pursuit, there is a very large number of basis functions, and the goal is to find a good fit of the given data as a linear combination of a small number of the basis functions.

Thus we seek a function f ∈ F f ∈ \mathcal{F} fF that fits the data well,
f ( u i ) ≈ y i ,    i = 1 , . . . , m f(u_i)\approx y_i,\ \ i=1,...,m f(ui)yi,  i=1,...,mwith a sparse coefficient vector x x x, i.e., c a r d ( x ) \mathbf{card}(x) card(x) small:
f = ∑ i ∈ B x i f i f=\sum_{i \in \mathcal{B}}x_if_i f=iBxifi

6.5.5 Interpolation with convex functions

If and only if there exist g 1 , . . . , g m g_1, . . . , g_m g1,...,gm such that
y j ≥ y i + g i T ( u j − u i ) ,    i , j = 1 , . . . , m y_j\geq y_i+g_i^T(u_j-u_i), \ \ i,j=1,...,m yjyi+giT(ujui),  i,j=1,...,mthen there exsit a convex function f : R k → R f:\mathbf{R}^k → \mathbf{R} f:RkR with d o m   f = R k \mathbf{dom} \ f= \mathbf{R}^k dom f=Rk, that satisfies
f ( u i ) = y i ,    i = 1 , . . . , m f(u_i)=y_i, \ \ i=1,...,m f(ui)=yi,  i=1,...,m
For example, f ( z ) = max ⁡ i = 1 , . . . , m ( y i + g i T ( z − u i ) ) . f(z)=\max_{i=1,...,m}(y_i+g_i^T(z-u_i)). f(z)=maxi=1,...,m(yi+giT(zui)).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值