reinforcement learning for Flappy bird

Flappy Bird RL

Flappy Bird hack using Reinforcement Learning

View on GitHub

The Hack.

This is a hack for the popular game, Flappy Bird. Although the game is no longer availableon Google Play or the App Store, it did not stop folks from creating very good replicas for the web. People have also created some interesting variants of the game - Flappy Bird Typing Tutor and Flappy Math Saga.

After playing the game a few times (read few hours), I saw the opportunity to practice my machine learning skills and try and get Flappy Bird to learn how to play the game by itself. The video above shows the results of a really well trained Flappy Bird that basically keeps dodging pipes forever.

The How.

Initially, I wanted to create this hack for the Android app and I was planning to useMonekyrunner to get screenshots and send click commands. But it takes about 1 - 2 seconds to get a screenshot and that was definitely not fast or responsive enough.

Then I found @mrspeaker's game engine, Omega500 and his version of Flappy Bird for typing practice. I ripped out the typing component and added some javascript Q Learning code to it.

Reinforcement Learning

Here's the basic principle: the agent, Flappy Bird in this case, performs a certain action in a state. It then finds itself in a new state and gets a reward based on that. There are many variants to be used in different situations: Policy Iteration, Value Iteration, Q Learning, etc.

Q Learning

I used Q Learning because it is a model free form of reinformcent learning. That means that I didn't have to model the dynamics of Flappy Bird; how it rises and falls, reacts to clicks and other things of that nature.

Here is a nice, concise description of Q Learning. The following is the algorithm.

State Space

I discretized my space over the folowing parameters.

  • Vertical distance from lower pipe
  • Horizontal distance from next pair of pipes
  • Life: Dead or Living

Actions

For each state, I have two possible actions

  • Click
  • Do Nothing

Rewards

The reward structure is purely based on the "Life" parameter.

  • +1 if Flappy Bird is still alive
  • -1000 if Flappy Bird is dead

The Learning Loop

The array Q is initialized with zeros and I always chose the best action, the action that will maximize my expected reward. To break ties I chose "Do Nothing" because that is the more common action.

Step 1: Observe what state Flappy Bird is in and perform the action that maximizes expected reward.

Let the game engine perform its "tick". Now. Flappy Bird is in a next state, s'.

Step 2: Observe new state, s', and the reward associated with it. +1 if the bird is still alive, -1000 otherwise.

Step 3: Update the Q array according to the Q Learning rule.

Q[s,a] ← Q[s,a] + α (r + γ*V(s') - Q[s,a])

The alpha I chose is 0.7 because we have a deterministic state and I wanted it to be pretty hard to un-learn something. Also, the dicount factor, lambda, was 1.

Step 4: Set the current state to s' and start over.

The Next Steps.

  • It took about 6-7 hours to train Flappy Bird to be good enough (150 score). This can be improved by instantiating more than 1 bird in the beginning and have all of them contribute their "learnings" to the same Q array.
  • Another way to make the learning faster would be to let users provide "good" input. Right now, you can click on the game to make Flappy Bird jump. But, that input is not taken into account by the learner.
  • Get this to work on a mobile phone!! If anyone has any ideas , please let me know in the comments :)

Credits.

I'd like to give a shout out to @mrspeaker for creating the Omega500 game engine and making it open source!

Comments.

Want to leave a comment? Visit  this post's issue page on GitHub (you'll need a GitHub account. What? Like you already don't have one?!).Flappy Bird RL is maintained by  SarvagyaVaish
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Flappy Bird是一个很好的示例,用于介绍深度强化学习的入门教程。有很多文章和论文介绍了如何使用深度强化学习来玩Flappy Bird这个游戏。其中一篇论文《Deep Reinforcement Learning for Flappy Bird》详细介绍了相关理论和原理,并提供了代码实现的细节。\[1\] 除了深度强化学习,还有其他方法可以完成Flappy Bird游戏。例如,一篇名为《Exploring Game Space Using Survival Analysis》的论文介绍了使用生存分析方法来完成游戏。如果您对这种方法感兴趣,可以查阅原文了解更多信息。\[2\] 关于Flappy Bird游戏的强化学习实现,有两个不同版本的代码可供参考。其中一个版本的代码可以在GitHub上找到,链接为https://github.com/yenchenlin1994/DeepLearningFlappyBird。这个版本的代码可以用来实现Flappy Bird游戏的强化学习。\[3\] 希望这些信息对您有帮助。如果您有任何问题或需要进一步的帮助,请随时提问。 #### 引用[.reference_title] - *1* *2* [用深度强化学习玩FlappyBird](https://blog.csdn.net/qq_32892383/article/details/89646221)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [强化学习及Python代码示例](https://blog.csdn.net/u011649885/article/details/75276392)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值