Paper reading
文章平均质量分 93
Mr丶Caleb
中国科学技术大学在读研究生
展开
-
Two-Stream Convolutional Networks for Action Recognition in Videos
下载地址:http://www.datascienceassn.org/sites/default/files/Two-Stream%20Convolutional%20Networks%20for%20Action%20Recognition%20in%20Videos.pdfWhat’s problem?这篇论文主要是介绍了一种新的视频卷积的方法,并且将其应用于UCF-101和HMDB-51数据原创 2016-12-16 15:07:26 · 2575 阅读 · 0 评论 -
3D Convolutional Neural Networks for Human Action Recognition
转载自 http://blog.csdn.net/zouxy09一、概述在现实的环境中,不同的场景存在杂乱背景、阻挡和视角变化等等情况,对于人来说,很容易就可以辨识出来,但对于计算机,就不是一件简单的事了。而以前的Human Action Recognition方法都是基于一些对应用场景苛刻的假设上的,例如目标小的尺度变化和小的视觉改变等。但这在现实世界中是很难满足的。目前,在这方面,大部分原创 2016-12-14 16:24:42 · 4847 阅读 · 4 评论 -
强化学习——A3C,GA3C
一、问题与贡献存在的问题 不同类型的深度神经网络为 DRL 中策略优化任务提供了高效运行的表征形式。 为了缓解传统策略梯度方法与神经网络结合时出现的不稳定性,各类深度策略梯度方法(如 DDPG、 SVG 等)都采用了经验回放机制来消除训练数据间的相关性。然而经验回放机制存在两个问题:agent 与环境的每次实时交互都需要耗费很多的内存和计算力;经验回放机制要求 agen转载 2017-08-11 21:00:02 · 6730 阅读 · 0 评论 -
PR17.10.2:Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
What’s problem and challenges?There are many sources of possible instability and variance that can lead to difficulties with reproducing deep policy gradient methods such as DDPG and TRPO.What’s the pr原创 2017-10-03 13:41:28 · 553 阅读 · 0 评论 -
PR17.10.4:Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
What’s problem?A major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often req原创 2017-10-06 16:23:24 · 727 阅读 · 0 评论 -
PR10.10:#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
What’s problem?Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decisio原创 2017-10-12 10:51:04 · 1252 阅读 · 0 评论 -
PR10.21:Trust Region Policy Optimization
What’s problem?根据策略梯度方法,参数更新方程式为: θnew=θold+α▽θJ\theta _{new}=\theta _{old}+\alpha \triangledown _{\theta }J 策略梯度算法的硬伤就在更新步长 \alpha ,当步长不合适时,更新的参数所对应的策略是一个更不好的策略,当利用这个更不好的策略进行采样学习时,再次更新的参数会更差,因此很容易原创 2017-10-21 12:12:20 · 718 阅读 · 0 评论