该问题的解释可以参见链接https://discuss.ray.io/t/meaning-of-episode-reward-mean/3839/5
下面是具体的含义:episode_reward指的是在discount value = 1.0时一个episode的return,也就是把episode中的每一个step获得的reward进行累加
参考知识:
In RL, episodes are considered agent-environment interactions from initial to final states.,也就是一个trajectory.