贝叶斯优化

BAYESIAN OPTIMIZATION 贝叶斯优化

Consider the following problem of finding a global minimizer (or maximizer) of an unknown objective function:
x ∗ = arg ⁡ min ⁡ x ∈ X f ( x ) \bm{x}^{\ast} = \arg\underset{\bm{x} \in \mathcal{X}}{\min} \quad f(\bm x) x=argxXminf(x)

注意, f f f 可能是非凸的,梯度信息是不可用的。
超参数搜索空间 X \mathcal{X} X

由于目标函数是未知的,贝叶斯策略是把它当作一个随机函数,并放置一个先验(prior)。

  • 先验 (prior) 是关于函数行为的置信度(belief)。
  • 在收集到可以作为数据函数评估后,对先验信息进行更新,形成目标函数的后验分布。
  • 再用后验分布构造一个采集(acquisition)函数,确定下一个查询点。

widely used Gaussian prior

f ( x 1 : t ) = [ f ( x 1 ) , ⋯   , f ( x t ) ] T ∼ N ( 0 , K t ) f(\bm{x}_{1:t}) = [f(\bm{x}_{1}) , \cdots, f(\bm{x}_{t}) ]^{T} \sim \mathcal{N} (\bm{0},\bm{K}_t) f(x1:t)=[f(x1),,f(xt)]TN(0,Kt)

  • kernel matrix t × t t \times t t×t and K t ( i , j ) = k ( ∥ x i − x j ∥ ) \bm{K}_t (i,j) = k( \Vert \bm{x}_i - \bm{x}_j \Vert) Kt(i,j)=k(xixj)

Two popular kernels

  • squared exponential (SE) kernel (径向基函数核,也被称为高斯核或平方指数核)
  • the Matérn kernel(马特恩协方差函数)

1

  • Γ ( ∗ ) \Gamma(*) Γ() Gamma function
  • B ν ( ∗ ) B_{\nu}(*) Bν() ν \nu ν-th order Bessel function
  • h h h hyper-parameter

In Bayesian optimization, at the t t t-th iteration

  • samples D 1 : t = { ( x i , f ( x i ) ) } i = 1 t \mathcal{D}_{1:t} = \left\{ \big( \bm{x}_i, f ( \bm{x}_i) \big) \right\}_{i=1}^{t} D1:t={(xi,f(xi))}i=1t
  • at the next query point, inferring the value of f ( x i + 1 ) f ( \bm{x}_{i+1}) f(xi+1)

根据高斯先验假设,

2

  • k t + 1 = [ k ( ∥ x t + 1 − x 1 ∥ ) , ⋯   , k ( ∥ x t + 1 − x t ∥ ) ] T \bm{k}_{t+1} = \left[ k( \Vert \bm{x}_{t+1} - \bm{x}_1 \Vert) ,\cdots, k( \Vert \bm{x}_{t+1} - \bm{x}_t \Vert) \right]^{T} kt+1=[k(xt+1x1),,k(xt+1xt)]T

  • 由于 [ f ( x 1 : t ) f ( x t + 1 ) ] \begin{bmatrix} f(\bm{x}_{1:t}) \\ f(\bm{x}_{t+1}) \end{bmatrix} [f(x1:t)f(xt+1)] 是联合正态的,条件分布 f ( x t + 1 ) ∣ f ( x 1 : t ) f(\bm{x}_{t+1}) \vert f(\bm{x}_{1:t}) f(xt+1)f(x1:t) 也一定是正态的,我们可以使用该条件分布的均值和方差的标准公式

4


3

Reduce the complexity

注意,(6) 和 (7) 的计算复杂度很高,因为矩阵和向量的维数随 t 增长。

acquisition function

Now that we have a model of the function and its uncertainty, we will use this to choose which point to sample next.

  • The acquisition function takes the mean and variance at each point on the function and computes a value that indicates how desirable it is to sample next at this position.

  • A good acquisition function should trade off exploration and exploitation.

four popular acquisition functions:

  • the upper confidence bound
  • expected improvement
  • probability of improvement
  • Thompson sampling

5

Upper confidence bound

Direct balance between exploration and exploitation:
This acquisition function is defined as:

6

Expected Improvement
  • Perhaps the most used acquisition.
  • It is too greedy in some problems. It is possible to make more explorative adding a ’explorative’ parameter
    7

8

https://www.cnblogs.com/marsggbo/p/9866764.html

9

https://www.cnblogs.com/marsggbo/p/9866764.html

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值