阅读笔记DAPG:Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Words and Expressions

  • dexterous 灵活的
  • to combat distribution drift
  • in principle
  • … which necessitate …
  • in the order of … 大约……
  • to combat that
  • We asymptotically decay the auxiliary objective.

Demo Augmented Policy Gradient (DAPG)

Demonstrations
  • kinesthetic teaching ?
    updated version of the Mujoco HAPTIX
  • CyberGlove III systen for recording fingers
  • HTC vive tracker for tracking base of hand
  • HTC vive headset for stereoscopic visualization
    25 successful demonstrations with noise
RL preliminaries

MDP: M = { S , A , R , T , ρ 0 , γ } \mathcal{M}=\{\mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{T}, \rho_0, \gamma\} M={ S,A,R,T,ρ0,γ} demonstrations data: ρ D = { ( s t ( i ) , a t ( i ) , s t + 1 ( i ) , r t ( i ) ) } \rho_D=\{(s_t^{(i)}, a_t^{(i)}, s_{t+1}^{(i)}, r_t^{(i)})\} ρD={ (st(i),at(i),st+1(i),rt(i))} to reduce sample complexity policy: π θ : S × A → R + \pi_\theta: \mathcal{S}\times\mathcal{A}\to \mathbb{R}_+ πθ:S×AR+ η ( π ) = E π , M [ ∑ t = 0 ∞ γ t r t ] \eta(\pi)=\mathbb{E}_{\pi,\mathcal{M}}\Big[\sum_{t=0}^\infty \gamma^t r_t\Big] η(π)=Eπ,M[t=0γtrt] V π ( s ) = E π , M [ ∑ t = 0 ∞ γ t r t ∣ s 0 = s ] V^\pi(s)=\mathbb{E}_{\pi,\mathcal{M}}\Big[\sum_{t=0}^\infty \gamma^t r_t | s_0=s\Big]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值