论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start...

本文提出了一种结合前馈神经网络和贝叶斯估计算法,进行多任务学习(迁移学习)的计算架构。

贝叶斯估计基础:
https://www.cnblogs.com/joezou/p/10658883.html

子母论文:

  • Scalable Bayesian Optimization Using Deep Neural Networks
  • Scalable Hyperparameter Transfer Learning

尽管在BO算法的目标函数中已经给出了,但在本文中有另一种更为直观的表示:
1612911-20190405163401270-1468572177.png
1612911-20190405163415525-887117520.png
1612911-20190405163437672-1268270840.png

同时,上式将原始目标函数:
1612911-20190405163501997-2071323434.png
变为:
1612911-20190405163519138-259474064.png
此时算法复杂度进一步从N(D3+D2)下降为ND2+D3 (这里N=T)(算法复杂度待核算)

本文的意图在于维持复杂度对样本维数成立方级增长,对样本数量线性增长的条件下实现多任务学习,具体贡献为两步:结合前馈神经网络NN与BO,实现多任务;将BO估计与NN结合,反过来加强NN对潜在表示向量的学习。
示意图如下:
1612911-20190405163646342-461878364.png
示意图来自作者

如上,子任务数据集Di被同时输入前馈网络NN,这基于目标拟合函数的基向量(或潜在表示向量)具有一定相似性的前提,我们在NN中采用共享权重,并将多输出结果分别传递到各自的BO-warm start估计函数中,计算GP参数,计算误差,计算偏导,再将NN的偏导权重(求和或加权后)反馈回NN网络,完成一次训练,结果是NN的潜在表示系数被学习出来,各自子任务的GP参数也被学习出来。
好处是,对于新的子任务,我们的NN已经被预训练,从而可以加速,称为“多任务迁移学习”。

文中指出:随机傅里叶变换比它的多任务可扩展性更好,但它的鲁棒性更强。
1612911-20190405163712242-738039624.png
1612911-20190405163721003-2024800793.png

实验设置:
测试环境:
(1)拟合二次函数,其中目标函数参数为随机生成,采用的训练-测试比为29:1
(2)拟合LIBSVM:

与作者问答:

  1. Would you mind sharing your idea about why your algorithm work more robustly comparing with the one in [21], as you mention in the sixth line on page 3?
  • What we meant there is that L-BFGS does not come with hyperparameters such as the SGD stepsize (note that [21] uses SGD). This is an advantage as you would have to set the stepsize for each specific BO problem, and if you just fix the stepsize to be the same for all BO problems, then you algorithm may not perform as robustly.
  1. I wonder if I miss the proof that ABLR can scale linearly, or you think it is a prerequisite knowledge that did not mentioned it. Would you like to point out where I can find it?
  • The idea is that instead of inverting an N x N matrix when computing the predictive mean and variance, you invert a D x D matrix, so the scaling is D^3 instead of N^3. This can be observed directly from looking at equations (1), (2), (3), where the most expensive operation is the matrix inversion.
Thanks for Valerio Perrone's answer to the questions purposed on this page.

转载于:https://www.cnblogs.com/joezou/p/10658905.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值