tic tac toe游戏 java_强化学习应用于游戏Tic-Tac-Toe

Tic-Tac-Toe游戏为3*3格子里轮流下棋,一方先有3子成直线的为赢家。

参考代码如下,我只删除了几个没用的地方:

########################################################################Copyright (C) ##2016 - 2018 Shangtong Zhang(zhangshangtong.cpp@gmail.com) ##2016 Jan Hakenberg(jan.hakenberg@gmail.com) ##2016 Tian Jun(tianjun.cpp@gmail.com) ##2016 Kenta Shimada(hyperkentakun@gmail.com) ##Permission given to modify the code as long as you keep this ##declaration at the top ##########################################################################https://www.cnblogs.com/pinard/p/9385570.html #### 强化学习(一)模型基础 ##

importnumpy as npimportpickle

BOARD_ROWS= 3BOARD_COLS= 3BOARD_SIZE= BOARD_ROWS * BOARD_COLS

State状态类

简要描述:每个状态用自定义hash值描述,主要方法为get_all_states(运行一次得到所有状态)和next_state(下一次棋,返回新的状态)

classState:def __init__(self):#the board is represented by an n * n array,

#1 represents a chessman of the player who moves first,

#-1 represents a chessman of another player

#0 represents an empty position

self.data =np.zeros((BOARD_ROWS, BOARD_COLS))

self.winner=None

self.hash_val=None

self.end=None#compute the hash value for one state, it's unique

defhash(self):if self.hash_val isNone:

self.hash_val=0for i in self.data.reshape(BOARD_ROWS *BOARD_COLS):#即原来取值-1,0,1,现在将-1设置为2,为了hash方便

if i == -1:

i= 2self.hash_val= self.hash_val * 3 +ireturnint(self.hash_val)#check whether a player has won the game, or it's a tie

defis_end(self):if self.end is notNone:returnself.end

results=[]#check row

for i inrange(0, BOARD_ROWS):

results.append(np.sum(self.data[i, :]))#check columns

for i inrange(0, BOARD_COLS):

results.append(np.sum(self.data[:, i]))#check diagonals

results.append(0)for i inrange(0, BOARD_ROWS):

results[-1] +=self.data[i, i]

results.append(0)for i inrange(0, BOARD_ROWS):

results[-1] += self.data[i, BOARD_ROWS - 1 -i]for result inresults:if result == 3:

self.winner= 1self.end=Truereturnself.endif result == -3:

self.winner= -1self.end=Truere

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值