n-armed bandit_Gittins index

最新推荐文章于 2020-01-15 16:28:11 发布

ep_mashiro

最新推荐文章于 2020-01-15 16:28:11 发布

阅读量1.8k

点赞数 2

分类专栏：推荐系统文章标签：强化学习

本文链接：https://blog.csdn.net/tinkle181129/article/details/50393865

版权

The complexity of solving MAB (multi-armed bandit) using Markov decision theory increases exponentially with the number of bandit processes.
Instead of solving the n-dimensional MDP with the state-space $\prod_{i=1}^n \chi^i$ , the optimal solution(Gittins Index) is obtained by solving n 1-dimensional optimization problems.
The index is given as,

ν i (x i) = s u p τ > 0 E [ \sum τ t = 0 β t r i ( X i t ) | X i 0 = x i ] E [ \sum τ t = 0 β t | X i 0 = x i ]

$\nu^i(x^i)=sup_{\tau>0}\frac {E[\sum_{t=0}^\tau \beta^tr^i(X_t^i)|X_0^i=x^i]}{E[\sum_{t=0}^\tau \beta^t|X_0^i=x^i]}$

Off-Line Algorithm for computing Gittins Index

1. Largest-Remaining-Index Algorithm

Initialization: identify the state $\alpha_1$ with the highest Gittins index.
$S(\alpha_1)=\chi$ , $\nu(\alpha_1)=r(\alpha_1)=r_{\alpha_1}$
choose: $\alpha_1=argmax_{\alpha\in\chi}\quad r_{\alpha}$
corresponding Gittins index is: $\nu(\alpha_1)=r_{\alpha_1}$
Recursion step:
Define the $m\times m$ matrix by $\forall a,b\in\chi$

Q (k) a, b = {P a, b 0 if b \in C (α k) otherwise

$Q_{a,b}^{(k)}= \begin{cases} P_{a,b}& \text{if b$\in C(\alpha_k)$}\\ 0& \text{otherwise} \end{cases}$
and define the

m×1 $m\times 1$ vectors:

最低0.47元/天解锁文章

ep_mashiro

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
n-armed bandit_Gittins index

The complexity of solving MAB (multi-armed bandit) using Markov decision theory increases exponentially with the number of bandit processes. Instead of solving the n-dimensional MDP with the state-sp
复制链接

扫一扫

专栏目录