莫烦Q_Learning探路者代码学习

最新推荐文章于 2024-01-11 21:11:47 发布

cy冲鸭

最新推荐文章于 2024-01-11 21:11:47 发布

阅读量733

点赞数 1

分类专栏：强化学习

本文链接：https://blog.csdn.net/weixin_41841797/article/details/84292983

版权

强化学习专栏收录该内容

7 篇文章 4 订阅

订阅专栏

Q-Learning算法：

拜读莫烦大神的代码，实现一个简单的小程序，实现一个探索者的游戏

-o---T #T是宝藏的位置，o是探索者的位置

在一个地点探索者都能作出两个行为left/right，具体代码如下：

"""
A simple example for Reinforcement Learning using table lookup Q-learning method.
An agent "o" is on the left of a 1 dimensional world, the treasure is on the rightmost location.
Run this program and to see how the agent will improve its strategy of finding the treasure.

View more on my tutorial page: https://morvanzhou.github.io/tutorials/
"""

import numpy as np
import pandas as pd
import time

np.random.seed(2)  # reproducible

N_STATES = 6   # the length of the 1 dimensional world
ACTIONS = ['left', 'right']     # available actions
EPSILON = 0.9   # greedy police
ALPHA = 0.1     # learning rate
GAMMA = 0.9    # discount factor
MAX_EPISODES = 13   # maximum episodes
FRESH_TIME = 0.3    # fresh time for one move

def build_q_table(n_states, actions): #建立Q表
    table = pd.DataFrame(
        np.zeros((n_states, len(actions))),     # q_table initial values
        columns=actions,    # actions's name
    )
    # print(table)    # show table
    return table

def choose_action(state, q_table):   #根据状态选动作
    # This is how to choose an action
    state_actions = q_table.iloc[state, :]
    # act non-greedy or state-action have no value
    if (np.random.uniform() > EPSILON) or ((state_actions == 0).all()):  
        action_name = np.random.choice(ACTIONS)
    else:   # act greedy
        action_name = state_actions.idxmax()    
    return action_name

def get_env_feedback(S, A):  #环境反馈
    # This is how agent will interact with the environment
    if A == 'right':    # move right
        if S == N_STATES - 2:   # terminate
            S_ = 'terminal'
            R = 1
        else:
            S_ = S + 1
            R = 0
    else:   # move left
        R = 0
        if S == 0:
            S_ = S  # reach the wall
        else:
            S_ = S - 1
    return S_, R

def update_env(S, episode, step_counter): #环境更新
    # This is how environment be updated
    env_list = ['-']*(N_STATES-1) + ['T']   # '---------T' our environment
    if S == 'terminal':
        interaction = 'Episode %s: total_steps = %s' % (episode+1, step_counter)
        print('\r{}'.format(interaction), end='')
        time.sleep(2)
        print('\r                                ', end='')
    else:
        env_list[S] = 'o'
        interaction = ''.join(env_list)
        print('\r{}'.format(interaction), end='')
        time.sleep(FRESH_TIME)

def rl():
    # main part of RL loop
    q_table = build_q_table(N_STATES, ACTIONS)
    for episode in range(MAX_EPISODES):
        step_counter = 0
        S = 0
        is_terminated = False
        update_env(S, episode, step_counter)
        while not is_terminated:

            A = choose_action(S, q_table)
            S_, R = get_env_feedback(S, A)  # take action & get next state and reward
            q_predict = q_table.loc[S, A]
            if S_ != 'terminal':  # next state is not terminal
                q_target = R + GAMMA * q_table.iloc[S_, :].max()   
            else:
                q_target = R     # next state is terminal
                is_terminated = True    # terminate this episode

            q_table.loc[S, A] += ALPHA * (q_target - q_predict)  # update
            S = S_  # move to next state

            update_env(S, episode, step_counter+1)
            step_counter += 1
    return q_table

if __name__ == "__main__":
    q_table = rl()
    print('\r\nQ-table:\n')
    print(q_table)

cy冲鸭

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
莫烦Q_Learning探路者代码学习

Q-Learning算法：拜读莫烦大神的代码，实现一个简单的小程序，实现一个探索者的游戏-o---T #T是宝藏的位置，o是探索者的位置在一个地点探索者都能作出两个行为left/right，具体代码如下："""A simple example for Reinforcement Learning using table lookup Q-learning method.A...
复制链接

扫一扫