Proability and Bayes’ NET

Probabilistic Inference

compute a desired probabilities from others known probabilities
我们通常计算条件概率
each possible state for the world has its own possibility

model:
joint distribution: table captures the likelihood of each outcome or assignment

inference by enumeration:(IBE)

  1. queue variables: Q
  2. evidence variables: e
  3. hidden variables:h
    选择和证据变量一致的,然后sum out消去隐藏变量,最后归一化
    drawback:
    storage 大,经验估计难

General situation of Uncertainty

  1. obseved variables(evidence)
    Agent knows certain things about the state of the world
  2. unobserved variables
    Agent needs to reason about other aspects
  3. model
    Agent knows something about how the known variables relate to the unknown variables
  4. a probabilistic model is a joint distribution over a set of random variables
    assignments are called outcomes
  5. events can be partial assignments or complete assignments
  6. conditional distribution: enumeration with normalization,select the joint probability matching the evidence and normalization
    eg. p ( W = s ∣ T = c ) = p ( W = s , T = c ) p ( T = c ) p(W=s|T=c)=\frac{p(W=s,T=c)}{p(T=c)} p(W=sT=c)=p(T=c)p(W=s,T=c)
    p ( T = c ) p(T=c) p(T=c)normalization
  7. 有时候我们有条件分布,我们想要联合分布,贝叶斯rule
  8. chain rule: p ( x 1 , x 2 , x 3 ) = p ( x 1 ) p ( x 2 ∣ x 1 ) p ( x 3 ∣ x 2 , x 1 ) p(x_1,x_2,x_3)=p(x_1)p(x_2|x_1)p(x_3|x_2,x_1) p(x1,x2,x3)=p(x1)p(x2x1)p(x3x2,x1)
  9. independence: is something like from CSPS, a structure
    unconditional independence is very rare, conditional independence is our basic and robust assumption
    if and only if 条件
    p ( x ∣ y , z ) = p ( x ∣ z ) p(x|y,z)=p(x|z) p(xy,z)=p(xz)
    p ( x , y ∣ z ) = p ( x ∣ z ) p ( y ∣ z ) p(x,y|z)=p(x|z)p(y|z) p(x,yz)=p(xz)p(yz)
    存储空间从product 变成了sum

Bayes Nets (representation)

joint probability table: o ( d n ) o(d^n) o(dn),存储空间占据过大,而且难以估计

bayes’ net(graphic models):
a directed acyclic graph with a local proobability table
每个node都存储一个条件概率表,conditioned on parents(n+1+1)列
node: can be assigned or unassigned
arc:interactions
each node is conditionally independent of all its ancestors node in the graph, given its parents

build a Bayes’ net

  1. numbering nodes from 1-N
  2. add nodes from smallest number in the graph
  3. add directed link from the existing nodes to the new one if there is an interaction between them(so no cycle)

causality

  1. BN need not actually be causal only represent conditional independence,reflect correlation
  2. be causal will simpler and easier

complexity

space complexity: o ( N ∗ 2 k + 1 ) o(N*2^{k+1}) o(N2k+1) 最多k parents

Bayes Nets(inference)

case: evidence variable, query variable, hidden variable

eliminate variables one by one

为了消去变量x,我们

  1. join all factors involving x
  2. sum out x
    factor: an unnormalized probability

inference by enumeration

steps: 1. select the entries consistent with the evidence
2.sum out hidden variable to get joint q and e
3. normalize
drawback: storage, and hard to estimate the probability empirically for multiple variables at a time(limited samples)
time complexity: o ( d n ) o(d^n) o(dn) 把所有变量都消去
space complexity: o ( d n ) o(d^n) o(dn) store the joint distribution
inference: calculating some useful quantity from a joint distribution
enumeration
在这里插入图片描述
factor: an unnormalized probability
elimination

  1. join
  2. elimination: marginalization
    interleave join and elimination
  3. if evidence starts out with factors that select that evidence
    eliminate variables one by one:(pick a hidden variable H, join all factors mention H, eliminate H)
  4. put all remain variable together and normalize

factor summary:
在这里插入图片描述

在这里插入图片描述
计算复杂度和空间复杂度取决于variable elimination 过程中,largest factor
没有特定顺序的factor

Bayes Nets( sampling)

从分布中生成样本,计算近似的 后验概率,要看convergence
inference:computation 比生成样本花费的时间多
learning:get samples from distribution you don’t know

every CPT participates principle

生成样本的分布和联合概率分布相同
prior sampling:
缺点:对于不太可能发生的事件,需要产生大量的sample,会浪费很多的样本
consistent
在这里插入图片描述
在这里插入图片描述

rejection sampling:
如果sample不符合我们的evidence,我们不生成,其实只是减少了生成样本的时间在这里插入图片描述

likelihood weighting: (most computionally efficient)
将变量设为证据变量,但这不能保证分布和我们原始分布一致
我们的sample只会和no evidence的乘积一致
因为相当于:
P ( Z 1 … Z p , E 1 … E m ) = ∏ i p P ( Z i ) ∣  Parent  s ( Z i ) P\left(Z_{1} \ldots Z_{p}, E_{1} \ldots E_{m}\right)=\prod_{i}^{p} P\left(Z_{i}\right) \mid \text { Parent } s\left(Z_{i}\right) P(Z1Zp,E1Em)=ipP(Zi) Parent s(Zi)

解决方法:
sample a value if the value is not an evidence variable, ortherwise, change the weight of the sample by multiplying P(e|q)
e.g.
在这里插入图片描述
在这里插入图片描述
evidence 支能影响之后的样本选择,并不能对之前的样本选择产生影响
我们希望每次sample都把evidence考虑在内
Gibbs sampling:
要用证据变量一致的start
首先随机将所有变量赋值,然后repetedly 选取一个变量,对其进行重新赋值,given other 变量的情况下
也可以收敛
这样所有的额variable 产生都基于evidence
resample的时候,我们只用考虑含用resampled part的table

Bayes Nets(D-separation)

in BN, are two variables independent(given evidence)?

causal chains

x → y → z x \rightarrow y \rightarrow z xyz

  1. x , z x,z x,zindependent? not
  2. x ⊥ z ∣ y x\perp z|y xzy

common cause

x ← y → z x \leftarrow y \rightarrow z xyz

  1. x , z x,z x,z independent? no
  2. x ⊥ z ∣ y x\perp z|y xzy

common effect

x → y ← z x \rightarrow y \leftarrow z xyz

  1. x ⊥ z x\perp z xz
  2. x ⊥ z ∣ y x\perp z|y xzy not
    observed the descendent of y has the same effect

genenral case and d-seperation

d-seperation: z 1 , ⋯   , z k z_1,\cdots,z_k z1,,zk d-seperate x and y , which means x ⊥ y ∣ z 1 , ⋯   , z k x\bot y|z_1,\cdots,z_k xyz1,,zk
在这里插入图片描述
Markov Blanket: a node is conditionally independent of all other nodes in the network,given its parents, children and children’s parents

D-seperation(另一种形式)

path: any consecutive of edges, disregarding their directions
unblock nodes: no collider
collider: head to head(发生信息交换)
rule1: x and y are d-connected, if there is an unblocked path between them
rule2:an unblocked path 经过 obseved z,then d separated
rule3: colliders are observed or descendent, then d-connected

中文意思: 没有collider且没有observe路径active,没有collider 的路径被观测 inactive,有collider的路径,collider或collider的descendent 被观测,active

topology limits distributions

given some graph topology, only certain joint distribution can be encode
图结构保证了一定 的独立性,kennel存在更多的独立性,full condition可以表示任何的distribution

题目

注意贝叶斯net无环

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值