DQN+Active Learning

最新推荐文章于 2021-12-14 22:49:28 发布

小明的梦想

最新推荐文章于 2021-12-14 22:49:28 发布

阅读量390

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/suoyan1539/article/details/79748198

版权

机器学习专栏收录该内容

11 篇文章 4 订阅

订阅专栏

关于MarkDown公式详细编辑可以参考博客

Initialize replay memory $M$ to capacity $N$
Initialize action-value function $Q$ with random weights
for episode = $1, 2,...,N$ do
     $D_l$ ← model and shuffle $D$
     $\phi$ ← Random
    for $i = 1$ , $|D|$ do
        Construct the state $si$ using $x_i$
        With probability $\epsilon$ select a random action $a_i$
        Otherwise select $a_i = arg$ $maxQ^\pi(s_i,a;\theta)$
        if $a_i$ = 1 then
            Obtain the annotation $y_i$
             $D_l$ ← $D_l + (x_i,y_i)$
            Updata model $\phi$ based on $D_l$
        end if
        Receive a reward $r_i$ from test data
         if $|D_l|$ = $B$ then
            Store transition $(s_i,s_i,r_i,Terminate)$ in $M$
            Break
        end if
        Construct the new state $s_{i+1}$
        Store transition $(s_i,s_i,r_i,s_{i+1})$ in $M$
        Sample random minibatch of transitionsfrom $M$ , and perform gradient descent step on $L(θ)$
        Update policy with $\theta$
    end for
end for