Convex Optimization 读书笔记 (6)

最新推荐文章于 2021-02-01 14:08:47 发布

来碗拿铁️

最新推荐文章于 2021-02-01 14:08:47 发布

阅读量362

点赞数

分类专栏：读书笔记凸优化文章标签：算法

本文链接：https://blog.csdn.net/qq_39337332/article/details/109564047

版权

读书笔记同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

凸优化

10 篇文章 0 订阅

订阅专栏

Chapter7: Statistical estimation

7.1 Parametric distribution estimation

7.1.1 Maximum likelihood estimation

Define log-likelihood function, and denoted l:
$l(x)=\log p_x(y)$ A widely used method, called maximum likelihood (ML) estimation, is to estimate $x$ as
$\hat{x}_{\rm ml}=\arg\max_xp_x(y)=\arg\max_xl(x)$

7.1.2 Maximum a posteriori probability estimation

Maximum a posteriori probability (MAP) estimation can be considered a Bayesian version of maximum likelihood estimation, with a prior probability density on the underlying parameter $x$ . We assume that $x$ (the vector to be estimated) and $y$ (the observation) are random variables with a joint probability density $p (x, y)$ .
The prior density of $x$ is given by
$p_x(x)=\int p(x,y)dy$ Similarly,
$p_y(y)=\int p(x,y)dx$ The conditional density of $y$ , given $x$ , is given by
$p_{y\mid x}(x,y)=\frac{p(x,y)}{p_x(x)}$ In the MAP estimation method, our estimate of $x$ , given the observation $y$ , is given by
$\begin{aligned} \hat{x}_{\rm map} &= \arg\max_xp_{x\mid y}(x,y)\\ &= \arg\max_xp_{y\mid x}(x,y)p_x(x)\\ &= \arg\max_xp(x,y)\\ \end{aligned}$

7.2 Nonparametric distribution estimation

7.3 Optimal detector design and hypothesis testing

Suppose $X$ is a random variable with values in ${1, . . . , n\}$ , with a distribution that depends on a parameter $θ ∈ \{1, . . . , m\}$ . The distributions of $X$ , for the $m$ possible values of $θ$ , can be represented by a matrix $\mathbf{R}^{n×m}$ , with elements
$p_{kj}=\mathbf{prob}(X=k\mid \theta=j)$ The $j$ th column of $P$ gives the probability distribution associated with the parameter value $θ = j$ . The $m$ values of $θ$ are called hypotheses, and guessing which hypothesis is correct is called hypothesis testing.

7.3.1 Deterministic and randomized detectors

A (deterministic) estimator or detector is a function $ψ$ from ${1, . . . , n\}$ (the set of possible observed values) into ${1, . . . , m\}$ (the set of hypotheses).
A randomized detector of $θ$ is a random variable $\hat{θ} ∈ \{1, . . . , m\}$ . A randomized detector can be defined in terms of a matrix $T\in\mathbf{R}^{m\times n}$ with
$t_{ik} = \mathbf{prob}(\hat{\theta}=i\mid X=k)$

7.3.2 Detection probability matrix

For the randomized detector defined by the matrix $T$ , we define the detection probability matrix as $D = T P$ . We have
$D_{ij}=(TP)_{ij}=\mathbf{prob}(\hat{\theta}=i\mid \theta=j)$ so $D_{ij}$ is the probability of guessing $\hat{θ} = i$ , when in fact $θ = j$ .

7.3.3 Optimal detector design

7.3.4 Multicriterion formulation and scalarization

The optimal detector design problem can be considered a multicriterion problem, and the $m (m - 1)$ objectives given by the off-diagonal entries of $D$ , which are the probabilities of the different types of detection error:
$\begin{aligned} {\rm minimize \ (w.r.t. \ } \mathbf{R}^{m(m-1)}_+) \ \ \ \ & D_{ij}, \ \ i,j=1,...,m, \ \ i\ne j \\ {\rm subject \ to \ } \ \ \ \ & t_k\succeq0, \ \ \bold{1}^Tt_k=1,k=1,...,n \end{aligned}$

7.3.5 Binary hypothesis testing

As an illustration, we consider the special case $m = 2$ , which is called binary hypothesis testing.

7.3.6 Robust detectors

We define the worst-case detection probability matrix $D^{\rm wc}$ as
$D^{\rm wc}_{ij}=\sup_{P\in\mathcal{P}}D_{ij}, \ \ i,j=1,...,m, \ \ i\neq j$ and
$D^{\rm wc}_{ii}=\inf_{P \in \mathcal{P}}D_{ii}, \ \ i=1,...,m$

7.4 Chebyshev and Chernoff bounds

7.4.1 Chebyshev bounds

Chebyshev bounds give an upper bound on the probability of a set based on known expected values of certain functions. If $X$ is a random variable on $\mathbf{R}$ with $\mathbb{E}X=μ$ and $\mathbb{E}(X−μ)^2 =σ^2$ , then we have $\mathbf{prob}(|X−μ|≥1)≤σ^2$ , again no matter what the distribution of $X$ is.

7.4.2 Chernoff bounds

Let $X$ be a random variable on $\mathbf{R}$ . The Chernoff bound states that
$\mathbf{prob}(X\geq u)\leq \inf_{\lambda\geq 0}\mathbb{E}e^{\lambda(X-u)}$ which can be expressed as
$\log\mathbf{prob}(X\geq u)\leq \inf_{\lambda\geq 0} \{-\lambda u +\log \mathbb{E} e^{\lambda X} \}$

7.4.3 Example

7.5 Experiment design

We consider the problem of estimating a vector $\mathbf{R}^n$ from measurements or experiments
$y_i=a_ix+w_i, \ \ i=1,...,m$ The associated estimation error $\hat{x} − x$ has zero mean and covariance matrix
$E=\mathbb{E}{ee^T}=(\sum_{i=1}^{m}a_ia_i^T)^{-1}$ We suppose that the vectors $a_1, . . . , a_m$ , which characterize the measurements, can be chosen among $p$ possible test vectors $v_1, . . . , v_p ∈ \mathbf{R}^n$ . The goal of experiment design is to choose the vectors $a_i$ , from among the possible choices, so that the error covariance $E$ is small (in some sense).

7.5.1 The relaxed experiment design problem

In the case when $m$ is large compared to $n$ , however, a good approximate solution can be found by ignoring, or relaxing, the constraint that the $m_i$ are integers. The relaxed experiment design problem is
$\begin{aligned} {\rm minimize \ (w.r.t. \ } \mathbf{S}^{n}_+) \ \ \ \ & E=\frac{1}{m}(\sum_{i=1}^p\lambda_iv_iv_i^T)^{-1} \\ {\rm subject \ to \ } \ \ \ \ & \lambda\succeq0, \ \ \bold{1}^T\lambda=1 \end{aligned}$

7.5.2 Scalarizations

$D$ -optimal design

The most widely used scalarization is called $D$ -optimal design, in which we minimize the determinant of the error covariance matrix $E$ .
$\begin{aligned} {\rm minimize} \ \ \ \ & \log \det (\sum_{i=1}^p\lambda_iv_iv_i^T)^{-1} \\ {\rm subject \ to \ } \ \ \ \ & \lambda\succeq0, \ \ \bold{1}^T\lambda=1 \end{aligned}$

$E$ -optimal design

In $E$ -optimal design, we minimize the norm of the error covariance matrix, the maximum eigenvalue of $E$ . The $E$ -optimal experiment design problem can be cast as an SDP
$\begin{aligned} {\rm minimize} \ \ \ \ & t \\ {\rm subject \ to \ } \ \ \ \ & \sum_{i=1}^p\lambda_iv_iv_i^T \succeq tI \\ &\lambda\succeq0, \ \ \bold{1}^T\lambda=1 \end{aligned}$

$A$ -optimal design

In A-optimal experiment design, we minimize $\mathbf{tr}E$ , the trace of the covariance matrix
$\begin{aligned} {\rm minimize} \ \ \ \ & \mathbf{tr} (\sum_{i=1}^p\lambda_iv_iv_i^T)^{-1} \\ {\rm subject \ to \ } \ \ \ \ & \lambda\succeq0, \ \ \bold{1}^T\lambda=1 \end{aligned}$