一、Qlearning算法思想构架
二、Qlearing算法程序实现
1.导入
import numpy as np
import pandas as pd
import time
2.给定初始值
N_STATES = 6 # 总长度the length of the 1 dimensional world
ACTIONS = ['left', 'right'] # Q表格的行为available actions
EPSILON = 0.9 # 贪婪率greedy police
ALPHA = 0.1 # 学习率learning rate
GAMMA = 0.9 # 最大Q(s')的衰减率→Q表现实值 discount factor
MAX_EPISODES = 6 # 最大循环数maximum episodes
FRESH_TIME = 0.3 # 更新时间fresh ti