Unity(ML-agents) for Imitaion Learning.

ML-Agents支持两种学习方式:强化学习(通过奖励学习)和模仿学习(通过模仿玩家行为)。模仿学习包括行为克隆(BC)和生成对抗性模仿学习(GAIL)。BC简单地复制演示行为,而GAIL则试图欺骗判别器使其认为动作来自演示。结合两者及外在奖励,能实现超越人类的表现。文章还提到了TensorBoard的可视化作用。
摘要由CSDN通过智能技术生成

ML-agent supports two types of learning:

1.RL:Learns by getting rewards.

2.Imitation:Learns by imitatating what the player does.

Imitation  learning is how you teach your ai directly how to behave in order to  achieve  a certain goal.

Firstly,set up the scene with some randomness,so that the ai doesn't know how to solve just one specific set of positions,so we randomize 

Two types of imitation learning that you can use [GAIL,BC]

BC - Behavior Cloning//行为克隆是最简单的模仿学习

GAIL-Generative Adversarial Imitation Learning,the goal of the discriminator is to figure out if a certain action  came from the agent or from the demo.so essentially over time our agent will learn how to behave like the demo in order to trick the discriminator.

GAIL works by trying to trick a discriminator  into pretending  that the  actions came from the demo,whereas BC simply  tries to copy  exactly what you did ,the limitation of the BC is that it can never get better than the demos.So in order to get the best results,we need to  combine  all three.

First use BC ,it learns to act exactly like you,then when  combined with GAIL,it learns to act similarily to you while achieving the same goal,and when combined  with extrinsic rewards,it continues improving  upon those two,that's how we get superhuman learning .

Visualization in tensorboard 

 

 

Personally Speaking:

I have watched some basic project completed by ml-agents.Personally speaking,it feels like 

Reference

【Unity 教学】【中字】教你的人工智能!使用 Unity ML-Agents进行模仿学习!_哔哩哔哩_bilibili

(269条消息) 模仿学习笔记:行为克隆_UQI-LIUWJ的博客-CSDN博客

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值