Words and Expressions
- dexterous 灵活的
- to combat distribution drift
- in principle
- … which necessitate …
- in the order of … 大约……
- to combat that
- We asymptotically decay the auxiliary objective.
Demo Augmented Policy Gradient (DAPG)
Demonstrations
- kinesthetic teaching ?
updated version of the Mujoco HAPTIX - CyberGlove III systen for recording fingers
- HTC vive tracker for tracking base of hand
- HTC vive headset for stereoscopic visualization
25 successful demonstrations with noise
RL preliminaries
MDP: M = { S , A , R , T , ρ 0 , γ } \mathcal{M}=\{\mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{T}, \rho_0, \gamma\} M={ S,A,R,T,ρ0,γ} demonstrations data: ρ D = { ( s t ( i ) , a t ( i ) , s t + 1 ( i ) , r t ( i ) ) } \rho_D=\{(s_t^{(i)}, a_t^{(i)}, s_{t+1}^{(i)}, r_t^{(i)})\} ρD={ (st(i),at(i),st+1(i),rt(i))} to reduce sample complexity policy: π θ : S × A → R + \pi_\theta: \mathcal{S}\times\mathcal{A}\to \mathbb{R}_+ πθ:S×A→R+ η ( π ) = E π , M [ ∑ t = 0 ∞ γ t r t ] \eta(\pi)=\mathbb{E}_{\pi,\mathcal{M}}\Big[\sum_{t=0}^\infty \gamma^t r_t\Big] η(π)=Eπ,M[t=0∑∞γtrt] V π ( s ) = E π , M [ ∑ t = 0 ∞ γ t r t ∣ s 0 = s ] V^\pi(s)=\mathbb{E}_{\pi,\mathcal{M}}\Big[\sum_{t=0}^\infty \gamma^t r_t | s_0=s\Big]