TF/06_Neural_Networks/05 06 07 08

05 Implementing Different Layers

06 Using Multiple Layers

07 Improving Linear Regression

08 Learning to Play Tic-Tac-Toe

Goal

This example intends to feed examples of best moves for many different board combinations into a neural network in order to train the model to play Tic-Tac-Toe.

The end of the script provides the user a chance to play against the trained model by asking for input moves and feeding such input moves into the model.

Data Format

All tic-tac-toe boards can be reduced down to a small number of boards, if we consider all geometric transformations on them. Such geometric transformations include:

  • Rotate 90 deg.
  • Rotate 180 deg.
  • Rotate 270 deg.
  • Vertical reflection.
  • Horizontal reflection.

All possible boards can be generated from the base board with at most 2 transformations.

The file base_tic_tac_toe_moves.csv contains rows, each of which represents a unique board representation with the desired best play tactic.

We denote the board spaces as such, ‘X’ = 1, ‘O’= -1, and an empty space will have a zero in it. The last column is the index of the best play response. A board will be indexed as follows:

 0 | 1 | 2
 ---------
 3 | 4 | 5
 ---------
 6 | 7 | 8

So for example, the board:

 O |   |
 ---------
 X | O | O
 ---------
   |   | X

is equivalent to the row: [-1, 0, 0, 1, -1, -1, 0, 0, 1].

TicTacToeIndexing

Neural Network Architecture

We will keep it simple and have only one hidden layer that is fully connected. The hidden layer will be composed of 81 hidden nodes. If only because square numbers are appealing. See the below diagram for the NN we will construct.

TicTacToe Architecture

Important Functions

There are a few important functions in the beginning of the code.

  1. print_board(): takes a board vector and shows it as a tic-tac-toe board.
  2. get_symmetry(): takes a board, the preferred response index, and a transformation. It then applies the transformation to the board and response to get a new vector.

At the end of the code, we loop through an actual game. This allows the user to actually play the model they created.

See Code Here

Sample Game Output

Here is a sample of the output of playing against the trained model. Human = X’s and the model = O’s.

Input index of your move (0-8): 4
Model has moved
 O |   |
___________
   | X |
___________
   |   |

Input index of your move (0-8): 6
Model has moved
 O |   |
___________
   | X |
___________
 X | O |

Input index of your move (0-8): 2
Model has moved
 O |   | X
___________
   | X | O
___________
 X | O |
Game Over!

结果

iteration 0 Loss: 7.07958
iteration 500 Loss: 1.55408
iteration 1000 Loss: 1.44303
iteration 1500 Loss: 1.32695
iteration 2000 Loss: 1.26285
iteration 2500 Loss: 1.16081
iteration 3000 Loss: 1.19241
iteration 3500 Loss: 0.867869
iteration 4000 Loss: 1.12367
iteration 4500 Loss: 0.930153
iteration 5000 Loss: 0.853733
iteration 5500 Loss: 0.793472
iteration 6000 Loss: 1.08713
iteration 6500 Loss: 0.786458
iteration 7000 Loss: 0.825338
iteration 7500 Loss: 0.812032
iteration 8000 Loss: 0.785613
iteration 8500 Loss: 0.735918
iteration 9000 Loss: 0.737191
iteration 9500 Loss: 0.717223
[6]
Input index of your move (0-8): 5
Model has moved
   |   |  
___________
   |   | X
___________
   |   | O
Input index of your move (0-8): 4
Model has moved
 O |   |  
___________
   | X | X
___________
   |   | O
Input index of your move (0-8): 3
Model has moved
 O |   |  
___________
 X | X | X
___________
   | O | O
Game Over!

Loss Output

TicTacToe Loss

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值