可解释RL-复现结果

1. 复现对象

1.1 相关链接

项目链接
论文链接
微信链接

因果简介
因果综述

1.2 仿真环境lunarlander

action:
a[0] 什么也不做
a[1] 启动左引擎
a[2] 启动主引擎
a[3] 启动右引擎
state:
s[0] is the horizontal coordinate
s[1] is the vertical coordinate
s[2] is the horizontal speed
s[3] is the vertical speed
s[4] is the angle
s[5] is the angular speed
s[6] 1 if first leg has contact, else 0
s[7] 1 if second leg has contact, else 0

1.3 仿真环境cartpole

action:
向左施加力
向右施加力
state:
车辆水平位置
车辆瞬时速度
杆与垂直方向的夹角
杆的角速度

2. lunarlander复现结果

运行结果如下:

2.1 第9步

starting causal discovery
	(angle') caused by (v_angle) with assurrance 0.99644 
	(angle') caused by (landed_legs) with assurrance 0.66270 
	(angle') caused by (engine) with assurrance 0.67312 
	(angle') caused by (angle) with assurrance 0.99992 
	(angle') caused by (vx) with assurrance 0.27607 
	(angle') caused by (vy) with assurrance 0.65637 
	(angle') caused by (x) with assurrance 0.83167 
	(angle') caused by (y) with assurrance 0.96726 
	(crash) caused by (v_angle) with assurrance 0.59318 
	(crash) caused by (landed_legs) with assurrance 0.99256 
	(crash) caused by (engine) with assurrance 0.08763 
	(crash) caused by (angle) with assurrance 0.85457 
	(crash) caused by (vy) with assurrance 0.91271 
	(crash) caused by (vx) with assurrance 0.58051 
	(crash) caused by (x) with assurrance 0.96912 
	(fuel_cost) caused by (angle) with assurrance 0.00000 
	(crash) caused by (y) with assurrance 0.66427 
	(fuel_cost) caused by (engine) with assurrance 1.00000 
	(fuel_cost) caused by (landed_legs) with assurrance 0.00000 
	(fuel_cost) caused by (vx) with assurrance 0.00000 
	(fuel_cost) caused by (v_angle) with assurrance 0.00000 
	(fuel_cost) caused by (vy) with assurrance 0.00000 
	(fuel_cost) caused by (x) with assurrance 0.00000 
	(fuel_cost) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (angle) with assurrance 0.69358 
	(landed_legs') caused by (engine) with assurrance 0.88777 
	(landed_legs') caused by (landed_legs) with assurrance 0.99966 
	(landed_legs') caused by (vx) with assurrance 0.85345 
	(landed_legs') caused by (v_angle) with assurrance 0.43689 
	(rest) caused by (angle) with assurrance 0.00000 
	(rest) caused by (engine) with assurrance 0.00000 
	(rest) caused by (landed_legs) with assurrance 0.00000 
	(rest) caused by (v_angle) with assurrance 0.00000 
	(rest) caused by (vx) with assurrance 0.00000 
	(rest) caused by (vy) with assurrance 0.00000 
	(landed_legs') caused by (vy) with assurrance 0.93282 
	(rest) caused by (x) with assurrance 0.00000 
	(rest) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (x) with assurrance 0.05350 
	(landed_legs') caused by (y) with assurrance 0.77181 
	(v_angle') caused by (angle) with assurrance 0.76077 
	(v_angle') caused by (engine) with assurrance 0.96332 
	(v_angle') caused by (landed_legs) with assurrance 0.99181 
	(v_angle') caused by (v_angle) with assurrance 0.99550 
	(v_angle') caused by (vx) with assurrance 0.16669 
	(v_angle') caused by (vy) with assurrance 0.85722 
	(v_angle') caused by (x) with assurrance 0.08066 
	(v_angle') caused by (y) with assurrance 0.08966 
	(vx') caused by (angle) with assurrance 0.92614 
	(vx') caused by (engine) with assurrance 0.36020 
	(vx') caused by (landed_legs) with assurrance 0.87006 
	(vx') caused by (v_angle) with assurrance 0.08313 
	(vx') caused by (vx) with assurrance 0.99999 
	(vx') caused by (vy) with assurrance 0.47437 
	(vx') caused by (y) with assurrance 0.05503 
	(vx') caused by (x) with assurrance 0.88342 
	(vy') caused by (angle) with assurrance 0.85471 
	(vy') caused by (engine) with assurrance 0.99994 
	(vy') caused by (landed_legs) with assurrance 0.99996 
	(vy') caused by (v_angle) with assurrance 0.04239 
	(vy') caused by (vx) with assurrance 0.26754 
	(vy') caused by (vy) with assurrance 1.00000 
	(vy') caused by (x) with assurrance 0.21860 
	(vy') caused by (y) with assurrance 0.92468 
	(x') caused by (angle) with assurrance 0.68152 
	(x') caused by (engine) with assurrance 0.00171 
	(x') caused by (v_angle) with assurrance 0.36346 
	(x') caused by (landed_legs) with assurrance 0.72334 
	(x') caused by (vx) with assurrance 1.00000 
	(x') caused by (vy) with assurrance 0.06414 
	(x') caused by (y) with assurrance 0.06903 
	(y') caused by (angle) with assurrance 0.21156 
	(x') caused by (x) with assurrance 1.00000 
	(y') caused by (engine) with assurrance 0.19189 
	(y') caused by (v_angle) with assurrance 0.92250 
	(y') caused by (vx) with assurrance 0.36926 
	(y') caused by (landed_legs) with assurrance 0.97075 
	(y') caused by (vy) with assurrance 0.99999 
	(y') caused by (x) with assurrance 0.15692 
	(y') caused by (y) with assurrance 1.00000 
-------------------discovered-causal-graph---------------------
(angle, y, x, v_angle) --> angle'
(angle, x, vy, landed_legs) --> crash
(engine) --> fuel_cost
(vy, vx, engine, landed_legs) --> landed_legs'
() --> rest
(v_angle, vy, engine, landed_legs) --> v_angle'
(angle, vx, x, landed_legs) --> vx'
(vy, angle, y, engine, landed_legs) --> vy'
(vx, x) --> x'
(v_angle, vy, y, landed_legs) --> y'
---------------------------------------------------------------

2.2 第99步

至此,耗时约6h

starting causal discovery
	(angle') caused by (v_angle) with assurrance 1.00000 
	(angle') caused by (engine) with assurrance 0.91401 
	(angle') caused by (landed_legs) with assurrance 0.91566 
	(angle') caused by (angle) with assurrance 1.00000 
	(angle') caused by (vx) with assurrance 0.84761 
	(angle') caused by (vy) with assurrance 0.79875 
	(angle') caused by (x) with assurrance 0.85814 
	(angle') caused by (y) with assurrance 0.81510 
	(crash) caused by (angle) with assurrance 0.09747 
	(crash) caused by (engine) with assurrance 0.63105 
	(crash) caused by (landed_legs) with assurrance 0.99990 
	(crash) caused by (v_angle) with assurrance 0.82667 
	(crash) caused by (vx) with assurrance 0.40407 
	(crash) caused by (vy) with assurrance 1.00000 
	(crash) caused by (x) with assurrance 0.99757 
	(crash) caused by (y) with assurrance 0.12045 
	(fuel_cost) caused by (angle) with assurrance 0.00000 
	(fuel_cost) caused by (engine) with assurrance 1.00000 
	(fuel_cost) caused by (v_angle) with assurrance 0.00000 
	(fuel_cost) caused by (landed_legs) with assurrance 0.00000 
	(fuel_cost) caused by (vx) with assurrance 0.00000 
	(fuel_cost) caused by (vy) with assurrance 0.00000 
	(fuel_cost) caused by (x) with assurrance 0.00000 
	(fuel_cost) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (angle) with assurrance 0.87897 
	(landed_legs') caused by (v_angle) with assurrance 0.25252 
	(landed_legs') caused by (vx) with assurrance 0.15359 
	(landed_legs') caused by (engine) with assurrance 0.77116 
	(landed_legs') caused by (landed_legs) with assurrance 1.00000 
	(rest) caused by (angle) with assurrance 0.00000 
	(rest) caused by (engine) with assurrance 0.00000 
	(landed_legs') caused by (x) with assurrance 0.30346 
	(landed_legs') caused by (vy) with assurrance 0.07573 
	(rest) caused by (landed_legs) with assurrance 0.96466 
	(rest) caused by (v_angle) with assurrance 0.00000 
	(rest) caused by (vx) with assurrance 0.90287 
	(rest) caused by (vy) with assurrance 0.00000 
	(rest) caused by (x) with assurrance 0.00000 
	(rest) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (y) with assurrance 0.99688 
	(v_angle') caused by (angle) with assurrance 0.86793 
	(v_angle') caused by (landed_legs) with assurrance 0.99553 
	(v_angle') caused by (engine) with assurrance 0.95110 
	(v_angle') caused by (v_angle) with assurrance 1.00000 
	(v_angle') caused by (vx) with assurrance 0.68554 
	(v_angle') caused by (vy) with assurrance 0.02769 
	(v_angle') caused by (x) with assurrance 0.79948 
	(v_angle') caused by (y) with assurrance 0.03491 
	(vx') caused by (angle) with assurrance 0.99994 
	(vx') caused by (engine) with assurrance 0.99999 
	(vx') caused by (v_angle) with assurrance 0.05110 
	(vx') caused by (landed_legs) with assurrance 0.26498 
	(vx') caused by (vx) with assurrance 0.99999 
	(vx') caused by (vy) with assurrance 0.32017 
	(vx') caused by (x) with assurrance 0.78903 
	(vx') caused by (y) with assurrance 0.02321 
	(vy') caused by (angle) with assurrance 0.72637 
	(vy') caused by (engine) with assurrance 0.99999 
	(vy') caused by (v_angle) with assurrance 0.29209 
	(vy') caused by (landed_legs) with assurrance 0.99994 
	(vy') caused by (vx) with assurrance 0.83871 
	(vy') caused by (x) with assurrance 0.05038 
	(vy') caused by (y) with assurrance 0.58507 
	(vy') caused by (vy) with assurrance 1.00000 
	(x') caused by (angle) with assurrance 0.00393 
	(x') caused by (engine) with assurrance 0.23336 
	(x') caused by (v_angle) with assurrance 0.00037 
	(x') caused by (landed_legs) with assurrance 0.50181 
	(x') caused by (vx) with assurrance 1.00000 
	(x') caused by (vy) with assurrance 0.00177 
	(x') caused by (y) with assurrance 0.00005 
	(x') caused by (x) with assurrance 1.00000 
	(y') caused by (angle) with assurrance 0.23687 
	(y') caused by (engine) with assurrance 0.98094 
	(y') caused by (v_angle) with assurrance 0.11466 
	(y') caused by (landed_legs) with assurrance 0.96841 
	(y') caused by (vx) with assurrance 0.02891 
	(y') caused by (vy) with assurrance 1.00000 
	(y') caused by (x) with assurrance 0.13967 
	(y') caused by (y) with assurrance 1.00000 
-------------------discovered-causal-graph---------------------
(x, landed_legs, angle, y, engine, vx, v_angle) --> angle'
(v_angle, vy, x, landed_legs) --> crash
(engine) --> fuel_cost
(angle, y, landed_legs) --> landed_legs'
(vx, landed_legs) --> rest
(v_angle, angle, engine, landed_legs) --> v_angle'
(angle, vx, engine) --> vx'
(vy, vx, engine, landed_legs) --> vy'
(vx, x) --> x'
(vy, y, engine, landed_legs) --> y'
---------------------------------------------------------------

2.3 第198步

starting causal discovery
	(angle') caused by (v_angle) with assurrance 1.00000 
	(angle') caused by (landed_legs) with assurrance 0.60837 
	(angle') caused by (engine) with assurrance 0.08607 
	(angle') caused by (angle) with assurrance 1.00000 
	(angle') caused by (vx) with assurrance 0.43705 
	(angle') caused by (vy) with assurrance 0.13294 
	(angle') caused by (x) with assurrance 0.43582 
	(angle') caused by (y) with assurrance 0.68825 
	(crash) caused by (angle) with assurrance 0.31873 
	(crash) caused by (engine) with assurrance 0.27178 
	(crash) caused by (v_angle) with assurrance 0.01975 
	(crash) caused by (landed_legs) with assurrance 0.81415 
	(crash) caused by (vx) with assurrance 0.21756 
	(crash) caused by (vy) with assurrance 0.99726 
	(crash) caused by (x) with assurrance 0.79158 
	(crash) caused by (y) with assurrance 0.86942 
	(fuel_cost) caused by (angle) with assurrance 0.00000 
	(fuel_cost) caused by (engine) with assurrance 1.00000 
	(fuel_cost) caused by (v_angle) with assurrance 0.00000 
	(fuel_cost) caused by (landed_legs) with assurrance 0.00000 
	(fuel_cost) caused by (vx) with assurrance 0.00000 
	(fuel_cost) caused by (vy) with assurrance 0.00000 
	(fuel_cost) caused by (x) with assurrance 0.00000 
	(fuel_cost) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (angle) with assurrance 0.17666 
	(landed_legs') caused by (landed_legs) with assurrance 1.00000 
	(landed_legs') caused by (engine) with assurrance 0.95938 
	(landed_legs') caused by (v_angle) with assurrance 0.31031 
	(landed_legs') caused by (vx) with assurrance 0.66512 
	(rest) caused by (angle) with assurrance 0.00000 
	(rest) caused by (engine) with assurrance 0.43517 
	(rest) caused by (landed_legs) with assurrance 0.00000 
	(rest) caused by (v_angle) with assurrance 0.00000 
	(landed_legs') caused by (x) with assurrance 0.69929 
	(rest) caused by (vx) with assurrance 0.00000 
	(rest) caused by (vy) with assurrance 0.04040 
	(landed_legs') caused by (vy) with assurrance 0.43862 
	(rest) caused by (x) with assurrance 0.76712 
	(rest) caused by (y) with assurrance 0.00000 
	(landed_legs') caused by (y) with assurrance 0.99029 
	(v_angle') caused by (angle) with assurrance 0.28326 
	(v_angle') caused by (landed_legs) with assurrance 0.82534 
	(v_angle') caused by (engine) with assurrance 0.87614 
	(v_angle') caused by (v_angle) with assurrance 0.99998 
	(v_angle') caused by (vx) with assurrance 0.20477 
	(v_angle') caused by (vy) with assurrance 0.95739 
	(v_angle') caused by (x) with assurrance 0.85125 
	(v_angle') caused by (y) with assurrance 0.02198 
	(vx') caused by (angle) with assurrance 1.00000 
	(vx') caused by (v_angle) with assurrance 0.12842 
	(vx') caused by (engine) with assurrance 1.00000 
	(vx') caused by (landed_legs) with assurrance 0.85017 
	(vx') caused by (vx) with assurrance 1.00000 
	(vx') caused by (vy) with assurrance 0.00734 
	(vx') caused by (x) with assurrance 0.02833 
	(vx') caused by (y) with assurrance 0.07515 
	(vy') caused by (angle) with assurrance 0.02162 
	(vy') caused by (v_angle) with assurrance 0.85994 
	(vy') caused by (engine) with assurrance 1.00000 
	(vy') caused by (landed_legs) with assurrance 0.99979 
	(vy') caused by (vx) with assurrance 0.24905 
	(vy') caused by (x) with assurrance 0.54831 
	(vy') caused by (vy) with assurrance 1.00000 
	(vy') caused by (y) with assurrance 0.00610 
	(x') caused by (angle) with assurrance 0.03291 
	(x') caused by (landed_legs) with assurrance 0.55412 
	(x') caused by (engine) with assurrance 0.17837 
	(x') caused by (v_angle) with assurrance 0.03749 
	(x') caused by (vx) with assurrance 1.00000 
	(x') caused by (vy) with assurrance 0.00592 
	(x') caused by (y) with assurrance 0.26801 
	(x') caused by (x) with assurrance 1.00000 
	(y') caused by (angle) with assurrance 0.07336 
	(y') caused by (engine) with assurrance 0.99998 
	(y') caused by (v_angle) with assurrance 0.00088 
	(y') caused by (landed_legs) with assurrance 0.98922 
	(y') caused by (vx) with assurrance 0.00064 
	(y') caused by (x) with assurrance 0.06248 
	(y') caused by (vy) with assurrance 1.00000 
	(y') caused by (y) with assurrance 1.00000 
-------------------discovered-causal-graph---------------------
(angle, v_angle) --> angle'
(vy, y, landed_legs) --> crash
(engine) --> fuel_cost
(y, engine, landed_legs) --> landed_legs'
() --> rest
(vy, x, landed_legs, engine, v_angle) --> v_angle'
(angle, vx, engine, landed_legs) --> vx'
(v_angle, vy, engine, landed_legs) --> vy'
(vx, x) --> x'
(vy, y, engine, landed_legs) --> y'
---------------------------------------------------------------

3. cartpole复现结果

3.1 第0步

20240525-1505启动,运行结果如下:

---------------step 0 / 200----------------
episodic return:	9.621621621621621
mean reward:	0.9075 (truth)
perform causal disocery with threshold 0.2
starting causal discovery
	(angle') caused by (PUSH) with assurrance 0.00030 
	(angle') caused by (angle_velocity) with assurrance 0.99963 
	(angle') caused by (angle) with assurrance 1.00000 
	(angle') caused by (position) with assurrance 0.01584 
	(angle') caused by (velocity) with assurrance 0.08129 
	(angle_velocity') caused by (angle) with assurrance 0.83305 
	(angle_velocity') caused by (PUSH) with assurrance 0.99976 
	(angle_velocity') caused by (angle_velocity) with assurrance 0.99949 
	(angle_velocity') caused by (position) with assurrance 0.42480 
	(angle_velocity') caused by (velocity) with assurrance 0.00627 
	(position') caused by (PUSH) with assurrance 0.29983 
	(position') caused by (angle) with assurrance 0.24988 
	(position') caused by (angle_velocity) with assurrance 0.07966 
	(position') caused by (position) with assurrance 0.99998 
	(position') caused by (velocity) with assurrance 0.98985 
	(velocity') caused by (PUSH) with assurrance 0.99482 
	(velocity') caused by (angle) with assurrance 0.44962 
	(velocity') caused by (angle_velocity) with assurrance 0.00733 
	(velocity') caused by (position) with assurrance 0.72191 
	(velocity') caused by (velocity) with assurrance 0.99105 
-------------------discovered-causal-graph---------------------
(angle, angle_velocity) --> angle'
(angle, PUSH, angle_velocity) --> angle_velocity'
(position, velocity) --> position'
(PUSH, velocity) --> velocity'
---------------------------------------------------------------

3.2 第99步

20240525-1635,运行结果如下:
耗时1.5h

---------------step 99 / 200----------------
episodic return:	200.0
mean reward:	1.0 (truth)
perform causal disocery with threshold 0.2
starting causal discovery
	(angle') caused by (position) with assurrance 0.00154 
	(angle') caused by (angle_velocity) with assurrance 1.00000 
	(angle') caused by (PUSH) with assurrance 0.21346 
	(angle') caused by (angle) with assurrance 1.00000 
	(angle') caused by (velocity) with assurrance 0.00000 
	(angle_velocity') caused by (angle) with assurrance 0.99996 
	(angle_velocity') caused by (angle_velocity) with assurrance 1.00000 
	(angle_velocity') caused by (PUSH) with assurrance 1.00000 
	(angle_velocity') caused by (position) with assurrance 0.00013 
	(angle_velocity') caused by (velocity) with assurrance 0.20396 
	(position') caused by (PUSH) with assurrance 0.00293 
	(position') caused by (angle) with assurrance 0.00000 
	(position') caused by (angle_velocity) with assurrance 0.00002 
	(position') caused by (position) with assurrance 1.00000 
	(position') caused by (velocity) with assurrance 1.00000 
	(velocity') caused by (PUSH) with assurrance 0.99923 
	(velocity') caused by (angle) with assurrance 0.92331 
	(velocity') caused by (angle_velocity) with assurrance 0.30441 
	(velocity') caused by (position) with assurrance 0.31953 
	(velocity') caused by (velocity) with assurrance 0.99826 
-------------------discovered-causal-graph---------------------
(angle, angle_velocity) --> angle'
(angle, PUSH, angle_velocity) --> angle_velocity'
(position, velocity) --> position'
(angle, PUSH, velocity) --> velocity'
---------------------------------------------------------------

3.3 第198步

20240525-1845已经跑完,运行结果如下:

---------------step 198 / 200----------------
episodic return:	200.0
mean reward:	1.0 (truth)
perform causal disocery with threshold 0.2
starting causal discovery
	(angle') caused by (position) with assurrance 0.00001 
	(angle') caused by (angle_velocity) with assurrance 1.00000 
	(angle') caused by (angle) with assurrance 1.00000 
	(angle') caused by (PUSH) with assurrance 0.16289 
	(angle') caused by (velocity) with assurrance 0.00004 
	(angle_velocity') caused by (angle) with assurrance 1.00000 
	(angle_velocity') caused by (angle_velocity) with assurrance 1.00000 
	(angle_velocity') caused by (PUSH) with assurrance 1.00000 
	(angle_velocity') caused by (position) with assurrance 0.03369 
	(angle_velocity') caused by (velocity) with assurrance 0.68328 
	(position') caused by (PUSH) with assurrance 0.00045 
	(position') caused by (angle) with assurrance 0.00000 
	(position') caused by (angle_velocity) with assurrance 0.00000 
	(position') caused by (position) with assurrance 1.00000 
	(position') caused by (velocity) with assurrance 1.00000 
	(velocity') caused by (PUSH) with assurrance 0.99903 
	(velocity') caused by (angle) with assurrance 0.99428 
	(velocity') caused by (angle_velocity) with assurrance 0.85500 
	(velocity') caused by (position) with assurrance 0.53975 
	(velocity') caused by (velocity) with assurrance 0.99811 
-------------------discovered-causal-graph---------------------
(angle, angle_velocity) --> angle'
(angle, PUSH, angle_velocity) --> angle_velocity'
(position, velocity) --> position'
(angle, PUSH, angle_velocity, velocity) --> velocity'
---------------------------------------------------------------
  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值