深度学习实现象棋
Erik Bernhardsson | 2017年2月2日 (by Erik Bernhardsson | February 2, 2017)
About Erik: Dad and CTO (Chief Troll Officer) at a fintech startup in NYC. Ex-Spotify, co-organizing NYC ML meetup, open source sometimes (Luigi, Annoy), blogs random stuff
关于埃里克 ( Erik) :爸爸和首席技术官(首席巨魔官)在纽约市的一家金融科技初创公司。 Ex-Spotify,共同组织NYC ML聚会,有时是开源的(Luigi,Annoy), 博客是随机的
象棋的深度学习 (Deep learning for… chess)
I’ve been meaning to learn Theano for a while and I’ve also wanted to build a chess AI at some point. So why not combine the two? That’s what I thought, and I ended up spending way too much time on it.
我一直想学习Theano已有一段时间,我也想在某个时候构建国际象棋AI。 那么为什么不将两者结合起来呢? 我就是这么想的,最终我花了太多时间在上面。
有什么理论? (What’s the theory?)
Chess is a game with a finite number of states, meaning if you had infinite computing capacity, you could actually solve chess. Every position in chess is either a win for white, a win for black, or a forced draw for both players. We can denote this by the function f(position) . If we had an infinitely fast machine we could compute this by
国际象棋是一种状态数量有限的游戏,这意味着如果您具有无限的计算能力,则实际上可以解决国际象棋 。 国际象棋中的每个位置要么是白棋的获胜,要么是黑棋的获胜,要么是两名玩家的平局。 我们可以用函数f(position)来表示。 如果我们有一个无限快的机器,我们可以通过
- Assign all the final positions the value −1,0,1 depending on who wins.
- Use the recursive rule
- 根据最终获胜者,为所有最终位置分配值−1,0,1。
- 使用递归规则
$$f(p) = max_{p rightarrow p’} -f(p’)$$
$$ f(p)= max_ {p rightarrow p'} -f(p')$$
where $p rightarrow p’$ denotes all the legal moves from position p. The minus sign is because the players alternate between positions, so if position p is white’s turn, then position p′ is black turns (and vice versa). This is the same thing as minimax.
其中$ p rightarrow p'$表示从位置p开始的所有合法移动。 减号是因为玩家在位置之间交替,所以如果位置p是白色的回合,则位置p'是黑色的回合(反之亦然)。 这与minimax相同。
There’s approximately 10^43 positions, so there’s no way we can compute this. We need to resort to approximations to f(p).
大约有10 ^ 43个位置,因此我们无法进行计算。 我们需要求近似于f(p)。