AI Planning

最新推荐文章于 2021-12-02 13:06:17 发布

metasearch

最新推荐文章于 2021-12-02 13:06:17 发布

阅读量777

点赞数

分类专栏：人工智能文章标签： transition primitive action function branch tree

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/metasearch/article/details/7212212

版权

人工智能专栏收录该内容

13 篇文章 0 订阅

订阅专栏

AI Planning

Theworld is:

1) dynamic

2) stochastic

3) partially observable

Actions:

1) take time

2) continuous effect

1 classical planning:

1.1 Assume: none of the above.

1.2 Modeling:

- States described by propositions currentlytrue

• Actions: general statetransformations described by

sets of pre- and post-conditions

• Represents a state-transitionsystem (but more

compact)

1.3planning: planning is searching

regression : from goal state to initialsate

forwarding: from initial sate to goal state

search method: BFS and DFS

1.4 STRIPS( s, g ) algorithm.

returns: a sequence of actions thattransforms s into g

1. Calculate the difference setd=g-s.

1. If d is empty, return an emptyplan

2. Choose action a whose add-listhas most formulas contained in g

3. p’ = STRIPS( s, precondition of a)

4. Compute the new state s’ byapplying p’ and a to s.

5. p = STRIPS( s’, g )

6. return p’;a;p

1.5 Refinement planning template

Refineplan( P : Plan set)

1. If P is empty, Fail.

2. If a minimal candidate of P is asolution, return it. End

3. Select a refinement strategy R

4. Apply R to P to get a new planset P’

5. Call Refineplan(P’ )

Termination ensured if R complete andmonotonic.

Existing Refinement Strategies

• State space refinement: e.g.STRIPS

• Plan space refinement: e.g. Leastcommitment

planning

• Task refinement: e.g. HTN

2 Stochastic environment

Ina stochastic environment, we use MDP to model and plan.

2.1 The issue using conventional planning in stochasticenvironment:

1)The branch factor is too large.

2)The tree is very deep.

3)Many states visited more than once

2.2 The MDP in stochastic environment

usage: robot navigation. Planning from x toy in a stochastic environments

sin {states,};

ain {actions(s)}; State Transition: T(S,a, S');

2.2.1:modeling:

Fully observable: S, A

Stochastic in state transition: T(S,a, S') = Pr(S'|S,a)

Reward in state S: R(S)- This is a short term and primitive value onpolicy.

2.2 find planning

Howto find a planning in MDP? or how to solve MDP?

Here we introduce some other value to solveMDP.

policy in state S: π(s)->the actiontaken in state s.

Value of state (node): V^π(s):expected total reward in state s after policy π. This is a long term evaluate function of the policy. Reinforcementlearning is to get best V(s), not R(s).

the objective:

Find a policy π(S) that max thefunction: E[ ]->max

Discount factor

Value function: (s)

Planning = calculate value functions

Howto compute this functions max?

Use value iteration to get optimal policies.

Everystate s, V^k(s) is finding an action that max the value of V^k(s)using the above function.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

博客等级

码龄19年

258
原创

84
点赞

284
收藏

158
粉丝

关注

私信

热门文章

分类专栏

最新评论

微信小程序最新部署说明
CSDN-Ada助手: 哇, 你的文章质量真不错，值得学习！不过这么高质量的文章, 还值得进一步提升, 以下的改进点你可以参考下: (1)增加除了各种控件外，文章正文的字数；(2)提升标题与正文的相关性；(3)增加条理清晰的目录。
目录的可读可写可执行的意义
dmgy0829: 目录读权限：表示用户可以用ls命令将目录下的具体子目录和文件罗列出来。目录写权限：表示用户可以在该目录下可创建或删除子目录或者文件。目录执行权限：表示可以用cd进入该目录
TCP滑动窗口机制
Deep Learning小舟: 学到了。
fortran 语言总结
m0_56706924: 非常感谢，受益匪浅。
cpu与sleep之间的关系
liucheng8: 为啥func需要100ms，毫无疑问CPU使用率必然飙到100%。？？sleep（1）=1000ms， 1000ms+100ms=1100ms，中100ms是运行的，那么cpu使用率应该= 100/1100 吧？

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。