贝叶斯优化简介-CSDN博客

BAYESIAN OPTIMIZATION 贝叶斯优化

Consider the following problem of finding a global minimizer (or maximizer) of an unknown objective function：
$\bm{x}^{\ast} = \arg\underset{\bm{x} \in \mathcal{X}}{\min} \quad f(\bm x)$

注意， $f$ 可能是非凸的，梯度信息是不可用的。
超参数搜索空间 $\mathcal{X}$ 。

由于目标函数是未知的，贝叶斯策略是把它当作一个随机函数，并放置一个先验（prior）。

先验 (prior) 是关于函数行为的置信度（belief）。
在收集到可以作为数据的函数评估后，对先验信息进行更新，形成目标函数的后验分布。
再用后验分布构造一个采集（acquisition）函数，确定下一个查询点。

widely used Gaussian prior

$f(\bm{x}_{1:t}) = [f(\bm{x}_{1}) , \cdots, f(\bm{x}_{t}) ]^{T} \sim \mathcal{N} (\bm{0},\bm{K}_t)$

kernel matrix $\times t$ and $\bm{K}_t (i,j) = k( \Vert \bm{x}_i - \bm{x}_j \Vert)$

Two popular kernels

squared exponential (SE) kernel （径向基函数核，也被称为高斯核或平方指数核）
the Matérn kernel（马特恩协方差函数）

$\Gamma(*)$ Gamma function
$B_{\nu}(*)$ $\nu$ -th order Bessel function
$h$ hyper-parameter

In Bayesian optimization, at the $t$ -th iteration

samples $\mathcal{D}_{1:t} = \left\{ \big( \bm{x}_i, f ( \bm{x}_i) \big) \right\}_{i=1}^{t}$
at the next query point, inferring the value of $\bm{x}_{i+1})$

根据高斯先验假设，

$\bm{k}_{t+1} = \left[ k( \Vert \bm{x}_{t+1} - \bm{x}_1 \Vert) ,\cdots, k( \Vert \bm{x}_{t+1} - \bm{x}_t \Vert) \right]^{T}$
由于 $\begin{bmatrix} f(\bm{x}_{1:t}) \\ f(\bm{x}_{t+1}) \end{bmatrix}$ 是联合正态的，条件分布 $f(\bm{x}_{t+1}) \vert f(\bm{x}_{1:t})$ 也一定是正态的，我们可以使用该条件分布的均值和方差的标准公式

Reduce the complexity

注意，(6) 和 (7) 的计算复杂度很高，因为矩阵和向量的维数随 t 增长。

acquisition function

Now that we have a model of the function and its uncertainty, we will use this to choose which point to sample next.

The acquisition function takes the mean and variance at each point on the function and computes a value that indicates how desirable it is to sample next at this position.
A good acquisition function should trade off exploration and exploitation.

four popular acquisition functions: