【强化学习】MCTS (Monte Carlo Tree Search)

MCTS

(Monte Carlo Tree Search)

 

1 MCTS基本概念

1.1 Monte Carlo

Monte Carlo是指,随机、大量地从某个分布中生成采样数据,以此计算某一特定目标值。

【举例】

For example, consider a circle inscribed in a unit square. Given that the circle and the square have a ratio of areas that is π/4, the value of π can be approximated using a Monte Carlo method:

·Draw a square, then inscribe a circle within it

·Uniformly scatter objects of uniform size over the square

·Count the number of objects inside the circle and the total number of objects

·The ratio of the inside-count and the total-sample-count is an estimate of the ratio of the two areas, which is π/4. Multiply the result by 4 to estimate π

Monte Carlo method applied to approximating the value of π. After placing 30,000 random points, the estimate for π is within 0.07% of the actual value.

 

1.2 Monte Carlo Tree Search

Each round of Monte Carlo tree search consists of four steps:[4]

· ·Selection: start from root R and select successive child nodes down to a leaf node L. The section below says more about a way of choosing child nodes that lets the game tree expand towards most promising moves, which is the essence of Monte Carlo tree search.

· ·Expansion: unless L ends the game with a win/loss for either player, create one (or more) child nodes and choose node Cfrom one of them.

· ·Simulation: play a random playout from node C. This step is sometimes also called playout or rollout.

· ·Backpropagation: use the result of the playout to update information in the nodes on the path from C to R.

Sample steps from one round are shown in the figure below. Each tree node stores the number of won/played playouts.

 

Steps of Monte Carlo tree search

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值