文章目录
因为本人留学生,考试的时候是英文的,所以文章就是中英文简单交杂,比较易于理解。也防止在考试的时候一些专有词汇看不懂。
practical reasoning
上课讲的这个代码我也不知道啥语言,我就用自己会的语法简单的写一下,这样看起来会舒服很多。
def deliberate(B,I=none):
//generate new desire D
D := options(B,i);
// generate new intentions I
I := filter(B,D,I)
return I
def execute(pi):
//停止执行,当pi空了,或者已经完成任务,或者不可能完成任务的时候
while not empty(pi) or succeeded(I,B) or impossible(I,B) do
//pi可能是一个list, set alpha as the first subplan of pi.
alpha := hd(pi);
execute(alpha);
//update pi.因为第一个plan已经执行了,所以pi就更新从第二个元素开始的pi
pi := tail(pi);
// check the encironment again再一次检查环境
get next percept p;
//update B based on the new environment
B := brf(B,p);
//这里之所以加入reconsider,是因为并不是每一次的deliberation都是有效的。如果不增加reconsider,
//就可能出现agent继续完成他的intentions即使意识到这不可能实现
//并且每一次都执行deliberate会花费很多的时间。
if reconsider(I,B) then
I = deliberate(B.I);
end-if
// check whether the plan still makes sense to achieve the intention given the new beliefs
//这里像是agent在行动之后,再次基于行动结果再次检查这次行动是否符合一开始指定的目标,如果不满足,就重新指定plan
if not sound(pi,I,B) then
pi := plan(B,I);
end-if
end-while
Agent Control Loop
// initial beliefs
B:=B0
while true do:
//observe the world to get percept p
get next percept p;
//apply belief revision function to current beliefs and percept p to generate new beliefs
B := brf(B,p);
//Apply deliberate function to current belifes to generate intentions
I := deliberate(B);
//Apply plan function to current beliefs B and intentions I to return a plan pi;
pi :=plan(B,I);
execute(pi);
end while
- 简单总结一下,就是agent要先从环境中得到percept,然后依次更新B(Belief),D(Desire),I(Intention),生成plan,执行plans;
- 执行plans的时候,执行第一个plan,然后更新plans列表,然后再次获取percept,更新B。
- D,I 的更新依赖于reconsider的判断,因为太耗费时间,并且可能在意识到不可能完成目标的同时继续执行persure intention。
- 然后判断一下这次执行的plan是否make sense,没有意义的话,就重新更新plan。
Embedded Agent
- 2 A 2^A 2A is the power set of A–> 2 A 2^A 2A is the set of all subsets of A.
- Cartesian product of set A and B A × B = { ( x , y ) ∣ x ∈ A a n d y ∈ B } A \times B = \{(x,y)|x \in A and y \in B\} A×B={(x,y)∣x∈Aandy∈B}(笛卡尔积)
- function: has a unique association for all element x.
- behaviorally equivalent:iff R ( A g 1 , E n v ) = R ( A g 2 , E n v ) R(Ag_1,Env) = R(Ag_2,Env) R(Ag1,Env)=R(Ag2,Env)(证明的时候常常使用假设的方法,限定条件)
Agents architecture 1. temperal logic
- Concurrent MetateM: commitment(是带有temperal logic符号的),proposition(不带有temperal logic符号的)[some rule fires 意思就是某一条规则激活]
Agent-based modelling
Coordination
-
formal representation of prescriptive norms: < t a r g e t , d e o n t i c , c o n t e x t , c o n t e n t , p u n i s h m e n t > <target,deontic,context,content,punishment> <target,deontic,context,content,punishment>
deontic只有三种:obligation,permission,prohibition -
FIRE trust model example:
-
multi-agent planning:
-
expections
重点就是:expectations can be set through communication
-
norm
- is not a strict rule.
- once prescriptive norms are set and know, no communication may be required to coordinate on a particular goal.
Negotiation
- dominant strategy:Strategy S is dominant if no matter what the other participants do, you can’t do any better than strategy S.
<百度百科>:占优策略是博弈论(game theory)中的专业术语,所谓的占优策略就是指无论竞争对手如何反应都属于本企业最佳选择的竞争策略。显然,在公司的商务竞争过程中,具有占优策略的一方无疑拥有明显的优势,处于竞争中的主动地位。占优策略有时是显而易见的。
- Nash equilibrium:if no participant can benefit from changing its strategy assuming the strategies of the others are unchanged.
<百度百科>:纳什平衡(Nash equilibrium),又称为非合作博弈均衡,是博弈论的一个重要术语,以约翰·纳什命名。在一个博弈过程中,无论对方的策略选择如何,当事人一方都会选择某个确定的策略,则该策略被称作支配性策略。如果两个博弈的当事人的策略组合分别构成各自的支配性策略,那么这个组合就被定义为纳什平衡。
- Pareto optimal: if every other outcome where an agent is better off, at least one agent is worse off.
<百度百科>:Pareto Optimal在维基的解释是:“不可能再改善某些人的境况,而不使任何其他人受损”。帕雷托最优的定义:帕雷托最优是资源分配的一种状态,在不使任何人境况变坏的情况下,不可能再使某些人的处境变好。帕累托最优(Pareto Optimality),也称为帕累托效率、帕累托改善,是博弈论中的重要概念,并且在经济学, 工程学和社会科学中有着广泛的应用。
- deal is individual rational:在于判断这个deal是否使得agent都获益了
- Monotonic concession protocol is a one-to-one protocol: 简单的说就是,两个agent,从most preferred deal开始提出自己的deal,然后不断的让步,根据计算的willing to risk conflict来判断哪一个agents进行让步。
此处举一个栗子:
再举一个带有欺骗行为的栗子:
Social Choice
- plurality voting: 每一个candidate只有ranks first 的才会计数;
- Condorcet principle: 两两比较,根据比较的顺序(agenda)不同会出现不同的结果;Condorcet paradox: w 1 > w 2 > w 3 > w 1 w_1>w_2>w_3>w_1 w1>w2>w3>w1;majority graph:就是按照节点node(there is an arrow from w 3 w_3 w3 to w 1 w_1 w1 because w 3 w_3 w3 wins.)
- Instant run-off voting:先比较每个第一个选择,排除最少的,然后该行以第二顺位参与下一轮排除,直到最后一个
- Copeland Rule 类似 Condorcet principle,但是赢的一方算作1,不管赢了多少都是1 point。
- Bonda Count:for the first choice, get (m-1) points. m 为candidate的数量。
- median voting rule: 一个很离谱的按照中位数的投票方法。
- Independence of irrelevant alternatives(IIA):就是说如果 W i > W j W_i>W_j Wi>Wj,就算加入了 W k W_k Wk,也不会出现 W i < W j W_i<W_j Wi<Wj. Borda count 不满足IIA。
Auction
- English auction: First price, open cry, ascending
- Dutch auction: First price, open cry, descending
- ** Vickrey auction**: Second price, sealed bid, one shot.
- normalised: if agent I gets no value for the empty allocation.
- free disposal: if agent is never worse off having more goods.
- XOR connective:
- Vickrey-Clarke-Groves-mechanism(VCG):
原理解释:
Argumentation
懒得码字了,直接上例题:
- credulous semantics
- skeptical semantics
- grounded semantics
- preferred semantics
- undercut
- rebuttal
补充一些遗漏的知识点
emergent behaviour
- Emergent behaviour cannot be analysed by looking at just individual agents
- Emergence is due to the structure of a dynamic system not its size
- a simulation allows you to see the effect of agent characteristics and their connections
- Feedback loops mean that the final results are not a simple function of the initial conditions of each individual agent.
Cellular automata
- Cellular automata are a simpler similar idea to ABM
- at each time step, each cell updates its state baed on its neighboring cells according to rules.
CA and ABM
- CA中,all cells follow the same rules.
- in CAs, all cells cannot move.
each cell has a static location and neighbourhood. - in CAs, no rich dynamic environments
- CAs are reactive and no memories.
extensions
- grounded extensions:最小的complete extension
- preferred extensions:最大的complete extension
- credulous和 skeptical是用来描述argument的属性。
- ground是那种不需要推理就能得到的,推理的是preferred
无题1
- if we removed lines 9 and 12, the agent could repeatedly reconsider its intentions and be at risk of never actually achieving them.
- without 10 and 11, the agent could continue to pursue intensions it no longer had a reason to desire.
无题2
- 在一个博弈中,如果双方不存在dominant strategy。Nash equilibrium 有时候不是双方的dominant strategy。
- Nash equilibrium是any change cannot improve their reward.
- Pareto optimal 是任何改变都不能改进自己,在不损害他人的利益的条件下。
- willing to risk conflict: 使用utility来计算的.
无题3
- every utility function defined over states can be expressed by a utility function defined over runs.
- The expected utility of a run in a system is the product of the probability that the run occurs and the utility of that run.
- if the environment and the agents devision making are deterministic, then the expected utility of a particular agent is equal to the utility of one run of the system.
properties of social choice
- resolute: unique winner
- surjective: every candidate has a chance to win
- dictatorship: winner is some voter’s first choice
- strategy proof: no one has incentive to cheat
- weakly Pareto: not select w1 when every one like w2 more than w1
surjective + independent of irrelevant alternative = dictatorship
resolute + surjective + strategy proof = dictatorship - median voter rule: weakly Pareto and strategy proof
two way to use argumentation in a MAS
- abstract argumentation
- argument dialogues