![](https://img-blog.csdnimg.cn/20190918135101160.png?x-oss-process=image/resize,m_fixed,h_224,w_224)
Reinforcement Learning
Heuristic Algorithm; Decision making
No Knownledge
One more thing
展开
-
基于GYM环境的DQN简单实现
DQN; Gym; CartPole-v0原创 2024-06-27 19:17:42 · 170 阅读 · 0 评论 -
Loss和Reward分析和改善DQN的训练
DQN; Loss; Reward原创 2024-06-26 10:04:42 · 388 阅读 · 0 评论 -
DQN结构—Evaluation Network和Target Network
DQN; Evaluation Network(评估网络)和Target Network(目标网络)原创 2024-06-24 16:52:19 · 937 阅读 · 0 评论 -
独立同分布(Independent and Identically Distributed)
独立同分布;机器学习;强化学习原创 2024-06-24 16:15:59 · 624 阅读 · 0 评论 -
Q-Learning中Bellman Equation的理解
MDP; Q-Learning; Bellman Equation原创 2024-06-22 17:16:54 · 88 阅读 · 0 评论 -
马尔可夫性质与Q学习在强化学习中的结合
马尔可夫性质; Q学习; 强化学习原创 2024-06-18 23:49:13 · 305 阅读 · 0 评论 -
Imitation Learning
Imitation Learning原创 2023-09-12 16:34:31 · 52 阅读 · 0 评论 -
Value-based vs Policy-based Reinforcement Learning
强化学习; Value-based Reinforcement Learning; Policy-based Reinforcement Learning原创 2023-08-14 16:38:57 · 100 阅读 · 0 评论 -
价值学习(Value-Based Reinforcement Learning)
强化学习;价值学习原创 2023-08-13 23:21:05 · 45 阅读 · 0 评论 -
策略学习(Policy-Based Reinforcement Learning)
强化学习;策略学习原创 2023-08-13 22:23:12 · 264 阅读 · 0 评论 -
置信域策略优化Trust Region Policy Optimization (TRPO)
置信域策略优化;Trust Region Policy Optimization; TRPO原创 2023-08-13 21:03:42 · 165 阅读 · 0 评论 -
Softmax Strategy
强化学习;智能决策原创 2023-08-13 15:53:44 · 796 阅读 · 0 评论 -
The Epsilon-Greedy Algorithm
machine learning; decision-making原创 2023-08-13 15:38:34 · 104 阅读 · 0 评论 -
Exploration vs Exploitation (Multi-arm Bandit Problem)
策略制定, UCB算法转载 2022-10-09 21:24:11 · 143 阅读 · 0 评论