Tic-Tac-Toe游戏为3*3格子里轮流下棋,一方先有3子成直线的为赢家。
参考代码如下,我只删除了几个没用的地方:
########################################################################Copyright (C) ##2016 - 2018 Shangtong Zhang(zhangshangtong.cpp@gmail.com) ##2016 Jan Hakenberg(jan.hakenberg@gmail.com) ##2016 Tian Jun(tianjun.cpp@gmail.com) ##2016 Kenta Shimada(hyperkentakun@gmail.com) ##Permission given to modify the code as long as you keep this ##declaration at the top ##########################################################################https://www.cnblogs.com/pinard/p/9385570.html #### 强化学习(一)模型基础 ##
importnumpy as npimportpickle
BOARD_ROWS= 3BOARD_COLS= 3BOARD_SIZE= BOARD_ROWS * BOARD_COLS
State状态类
简要描述:每个状态用自定义hash值描述,主要方法为get_all_states(运行一次得到所有状态)和next_state(下一次棋,返回新的状态)
classState:def __init__(self):#the board is represented by an n * n array,
#1 represents a chessman of the player who moves first,
#-1 represents a chessman of another player
#0 represents an empty position
self.data =np.zeros((BOARD_ROWS, BOARD_COLS))
self.winner=None
self.hash_val=None
self.end=None#compute the hash value for one state, it's unique
defhash(self):if self.hash_val isNone:
self.hash_val=0for i in self.data.reshape(BOARD_ROWS *BOARD_COLS):#即原来取值-1,0,1,现在将-1设置为2,为了hash方便
if i == -1:
i= 2self.hash_val= self.hash_val * 3 +ireturnint(self.hash_val)#check whether a player has won the game, or it's a tie
defis_end(self):if self.end is notNone:returnself.end
results=[]#check row
for i inrange(0, BOARD_ROWS):
results.append(np.sum(self.data[i, :]))#check columns
for i inrange(0, BOARD_COLS):
results.append(np.sum(self.data[:, i]))#check diagonals
results.append(0)for i inrange(0, BOARD_ROWS):
results[-1] +=self.data[i, i]
results.append(0)for i inrange(0, BOARD_ROWS):
results[-1] += self.data[i, BOARD_ROWS - 1 -i]for result inresults:if result == 3:
self.winner= 1self.end=Truereturnself.endif result == -3:
self.winner= -1self.end=Truere