强化学习
连理o
负优化砖家
展开
-
deep Q-network (DQN)
Human-level control through deep reinforcement learningsource code参考:解析 DeepMind 深度强化学习目录DQNQ-networkChallengesExperience replayFiexed Q-TargetsTraining algorithm for deep Q-networksDQNdeep Q-network (DQN)DQN can learn successful policies directly.原创 2020-11-15 12:53:53 · 562 阅读 · 0 评论 -
RL (Chapter 1): The Reinforcement Learning Problem
本文为《Reinforcement Learning: An Introduction》的读书笔记目录1.1 Reinforcement LearningThe idea that we learn by interacting with our environment is probably the firrst to occur to us when we think about the nature of learning. Whether we are learning to drive a c原创 2020-10-24 22:31:44 · 269 阅读 · 0 评论 -
RL(Chapter 1): Tic-Tac-Toe (井字棋)
本文为强化学习笔记,主要参考以下内容:David Silver 强化学习课程 以及 知乎上一个很棒的课程总结Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录An Extended Example: Tic-Tac-Toe (井字棋)Code (Python)`State` 类`HumanPlayer` 类`Player` 类`Judger` 类训练及对局部分An Extended Example: Tic-原创 2020-10-25 23:04:10 · 880 阅读 · 0 评论 -
RL(Chapter 2): Multi-arm Bandits (多臂读博机)
本文为强化学习笔记,主要参考以下内容:David Silver 强化学习课程 以及 知乎上一个很棒的课程总结Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录An nnn-Armed Bandit ProblemExplore & ExploitAction-Value Methods 动作-价值方法Sample-average method 采样平均方法Greedy Method 贪心方法ε\vare原创 2020-10-29 08:51:36 · 799 阅读 · 0 评论 -
RL(Chapter 3): Finite Markov Decision Processes (有限马尔可夫决策过程)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github莫烦 Python 的强化学习教程还有两个应该比较好的公开课,我还没看过:李宏毅 2020 深度强化学习课程David Silver 强化学习课程 以及 知乎上一个很棒的课程总结目录The Agent–Environment InterfaceGoals and RewardsReturns and EpisodesU原创 2020-11-01 20:45:15 · 693 阅读 · 0 评论 -
RL(Chapter 3): GridWorld
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录GridWorldCodeSettingsEnvironment可视化Question (a)Question (b)ExerciseExerciseExercise 3.243.243.24GridWorldFigure 3.2 (left) shows a rectangular gridworld represent原创 2020-11-01 20:50:41 · 1113 阅读 · 1 评论 -
RL(Chapter 4): Dynamic Programming (DP) (动态规划)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Policy Evaluation (Prediction)DP refers to a collection of algorithms that can be used to compute optimal policies given a perfect model (完备模型) of the environment原创 2020-11-04 14:44:19 · 696 阅读 · 0 评论 -
RL(Chapter 4): Gridworld
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录CodeSettingsEnvironmentVisualizationIterative policy evaluationConsider the 4×44\times44×4 gridworld shown below.The nonterminal states are S={1,2,...,14}S = \{1原创 2020-11-03 11:40:16 · 672 阅读 · 0 评论 -
RL(Chapter 4): Jack’s Car Rental
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Q1CodeSettingsEnvironmentPolicy IterationQ2Q1Jack manages two locations for a nationwide car rental company. Each day, some number of customers arrive at each loc原创 2020-11-03 16:58:18 · 3389 阅读 · 0 评论 -
RL(Chapter 4): Gambler’s Problem
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Gambler’s ProblemExerciseCodeGambler’s ProblemA gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads,原创 2020-11-04 14:40:02 · 696 阅读 · 0 评论 -
RL(Chapter 5): Monte Carlo Methods (MC) (蒙特卡洛方法)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Monte Carlo PredictionMonte Carlo methods are ways of solving the reinforcement learning problem based on averaging sample returns, thus requiring only experiencee原创 2020-11-11 20:55:18 · 1224 阅读 · 0 评论 -
RL(Chapter 5): Blackjack (二十一点)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录BlackjackBlackjackThe object of the popular casino card game of blackjack is to obtain cards the sum of whose numerical values is as great as possible without exc原创 2020-11-12 09:13:54 · 1556 阅读 · 0 评论 -
RL(Chapter 6): Temporal-Difference Learning (TD learning) (时序差分学习)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录TD PredictionTD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.Like Monte Carlo methods, TD methods can learn directly from ra原创 2020-11-14 12:12:51 · 1461 阅读 · 0 评论 -
RL(Chapter 6): Random Walk
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Random WalkCodeEnvironmentTD(0)MC methodVisualizationRandom walk under batch updatingCodeRandom WalkIn this example we empirically compare the prediction abilitie原创 2020-11-13 19:03:46 · 779 阅读 · 1 评论 -
RL(Chapter 6): Windy Gridworld
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Windy GridworldCodeEnvironmentSarsaVisualizationWindy Gridworld with King’s MovesWindy GridworldShown below is a standard gridworld, with start and goal states, b原创 2020-11-13 22:10:05 · 1789 阅读 · 0 评论 -
RL(Chapter 6): Cliff Walking
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录Cliff WalkingCodeEnvironmentSarsa, Expected SarsaQ-learningVisualizationCliff WalkingThis gridworld example compares Sarsa and Q-learning, highlighting the differ原创 2020-11-14 10:35:36 · 1196 阅读 · 0 评论 -
RL(Chapter 7): n-step Bootstrapping (n步自举法)
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录nnn-step Bootstrappingnnn-step TD Predictionnnn-step Sarsannn-step Off-policy Learningnnn-step BootstrappingIn this chapter we present nnn-step TD methods that g原创 2020-11-25 20:22:42 · 643 阅读 · 0 评论 -
RL(Chapter 7): Random Walk
本文为强化学习笔记,主要参考以下内容:Reinforcement Learning: An Introduction代码全部来自 GitHub习题答案参考 Github目录CodeEnvironmentnnn-step TD methodVisualizationConsider using nnn-step TD methods on the 5-state random walk task.Suppose the first episode progressed directly f原创 2020-11-17 14:50:41 · 253 阅读 · 0 评论