构建强化学习
Ten months ago, I started my work as an undergraduate researcher. What I can clearly say is that it is true that working on a research project is hard, but working on an Reinforcement Learning (RL) research project is even harder!
牛逼恩个月前,我开始了我的工作,作为一个大学生研究员。 我可以明确地说的是, 从事研究项目确实很辛苦,但是从事强化学习(RL)研究项目的确更难!
What made it challenging to work on such a project was the lack of proper online resources for structuring such type of projects;
从事这样一个项目的挑战是缺乏适当的在线资源来构造这种类型的项目 ;
Structuring a Web Development project? Check!
构建Web开发项目? 检查!
Structuring a Mobile Development project? Check!
构建移动开发项目? 检查!
Structuring a Machine Learning project? Check!
构建机器学习项目? 检查!
Structuring a Reinforcement Learning project? Not really!
构建强化学习项目? 并不是的!
To better guide future novice researchers, beginner machine learning engineers, and amateur software developers to start their RL projects, I pulled up this non-comprehensive step-by-step guide for structuring an RL project which will be divided as follows:
为了更好地指导未来的新手研究人员,初学者机器学习工程师和业余软件开发人员启动RL项目,我整理了这份非全面的分步指南,以构建RL项目 ,该指南分为以下几部分:
Start the Journey: Frame your Problem as an RL Problem
开始旅程:将您的问题定为RL问题
Choose your Weapons: All the Tools You Need to Build a Working RL Environment
选择武器:建立有效的RL环境所需的所有工具
Face the Beast: Pick your RL (or Deep RL) Algorithm
面对野兽:选择您的RL(或深度RL)算法
Tame the Beast: Test the Performance of the Algorithm
驯服野兽:测试算法的性能
Set it Free: Prepare your Project for Deployment/Publishing
免费设置:为部署/发布准备项目
In this post, we will discuss the first part of this series:
在本文中,我们将讨论本系列的第一部分:
开始旅程:将您的问题定为RL问题 (Start the Journey: Frame your Problem as an RL Problem)
This step is the most crucial in the whole project. First, we need to make sure whether Reinforcement Learning can be actually used to solve your problem or not.
这是整个项目中最关键的一步。 首先,我们需要确定强化学习是否可以真正用于解决您的问题 。
1.将问题视为马尔可夫决策过程(MDP) (1. Framing the Problem as a Markov Decision Process (MDP))
For a problem to be framed as an RL problem, it must be first modeled as a Markov Decision Process (MDP).
对于要被构造为RL问题的问题,必须首先将其建模为马尔可夫决策过程(MDP)。
A Markov Decision Process (MDP) is a representation of the sequence of actions of an agent in an environment and their consequences on not only the immediate rewards but also future states and rewards.
马尔可