Mastering the game of Go without human knowledge 伪代码

最新推荐文章于 2024-07-24 14:05:30 发布

popcandyy

最新推荐文章于 2024-07-24 14:05:30 发布

阅读量139

点赞数 1

文章标签：机器学习强化学习算法 alpha zero

本文链接：https://blog.csdn.net/popcandyy/article/details/106329044

版权

好不容易啃完了文章写一个psuedo code下次看起来方便。文章写得比较清晰但是也有比较迷惑的地方琢磨了很久看的时候遇到相似问题的朋友也可以从这找到点提示。
原文链接：AlphaGo-Zero

从性能上讲AlphaGo-Zero>AlphaGo-Master>AlphaGo-Fan等其他以打败棋手命名的围棋机。Zero性能最好的原因在于，他全程采用非监督学习，数据全部来源于self-play buffer。并且采用边训练边validate的模式，有效地防止了过拟合。非监督的好处在于，机器可以学习到先前人们没有总结过的棋谱。AlphaGo-Master和AlphaGo-Fan等都是输入棋谱作为监督（master对神经网络结构进行了改善），因而效果不如zero。
在这里插入图片描述

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

popcandyy

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Mastering the game of Go without human knowledge 伪代码

好不容易啃完了文章写一个psuedo code下次看起来方便。文章写得比较清晰但是也有比较迷惑的地方琢磨了很久看得时候遇到相似问题的朋友也可以从这找到点提示。原文链接：AlphaGo-ZeroBeginrandomize parameter for the network of the best player while the time limitation for learning is not reached do Procedure Self-play:
复制链接

扫一扫