莫烦python 强化学习 (Reinforcement Learning)

最新推荐文章于 2024-04-01 08:53:15 发布

卢容和

最新推荐文章于 2024-04-01 08:53:15 发布

阅读量647

点赞数

文章标签： python 机器学习深度学习

本文链接：https://blog.csdn.net/qq_41329791/article/details/120473896

版权

在这里插入图片描述

Q-Learning决策过程

在这里插入图片描述

Q-learning 小例子

-o---T
# T 就是宝藏的位置, o 是探索者的位置

每一次移动，状态发生改变的反馈

def get_env_feedback(S, A):
    # This is how agent will interact with the environment
    if A == 'right':    # move right
        if S == N_STATES - 2:   # terminate
            S_ = 'terminal'
            R = 1
        else:
            S_ = S + 1
            R = 0
    else:   # move left
        R = 0
        if S == 0:
            S_ = S  # reach the wall
        else:
            S_ = S - 1
    return S_, R

RL算法：选择、更新

def rl():
    q_table = build_q_table(N_STATES, ACTIONS)  # 初始 q table
    for episode in range(MAX_EPISODES):     # 回合
        step_counter = 0
        S = 0   # 回合初始位置
        is_terminated = False   # 是否回合结束
        update_env(S, episode, step_counter)    # 环境更新
        while not is_terminated:

            A = choose_action(S, q_table)   # 选行为
            S_, R = get_env_feedback(S, A)  # 实施行为并得到环境的反馈
            q_predict = q_table.loc[S, A]    # 估算的(状态-行为)值
            if S_ != 'terminal':
                q_target = R + GAMMA * q_table.iloc[S_, :].max()   #  实际的(状态-行为)值 (回合没结束)
            else:
                q_target = R     #  实际的(状态-行为)值 (回合结束)
                is_terminated = True    # terminate this episode

            q_table.loc[S, A] += ALPHA * (q_target - q_predict)  #  q_table 更新
            S = S_  # 探索者移动到下一个 state

            update_env(S, episode, step_counter+1)  # 环境更新

            step_counter += 1
    return q_table

卢容和

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
莫烦python 强化学习 (Reinforcement Learning)

Q-Learning决策过程Q-learning 小例子-o---T# T 就是宝藏的位置, o 是探索者的位置每一次移动，状态发生改变的反馈def get_env_feedback(S, A): # This is how agent will interact with the environment if A == 'right': # move right if S == N_STATES - 2: # terminate ..
复制链接

扫一扫