求解最大似然估计时发现有两种表示方法
有上述两种方法表示的原因
p(x|theta)不总是代表条件概率;也就是说p(x|theta)不代表条件概率时与p(x;theta)等价
而一般地
写竖杠表示条件概率,是随机变量;
写分号p(x; theta)表示待估参数(是固定的,只是当前未知),应该可以直接认为是p(x),加了;是为了说明这里有个theta的参数,p(x; theta)意思是随机变量X=x的概率。在贝叶斯理论下又叫X=x的先验概率。
对于P(y|x;theta)
对于两种表示法,频率派和贝叶斯派的分歧
频率派认为参数为固定的值,是指真实世界中,参数值就是某个定值。
贝叶斯派认为参数是随机变量,是指取这个值是有一定概率的
I believe the origin of this is the likelihood paradigm (though I have not checked the actual historical correctness of the below, it is a reasonable way of understanding how iot came to be).
Let's say in a regression setting, you would have a distribution: p(Y | x, beta) Which means: the distribution of Y if you know (conditional on) the x and beta values.
If you want to estimate the betas, you want to maximize the likelihood: L(beta; y,x) = p(Y | x, beta) Essentially, you are now looking at the expression p(Y | x, beta) as a function of the beta's, but apart from that, there is no difference (for mathematical correct expressions that you can properly derive, this is a necessity --- although in practice noone bothers).
Then, in bayesian settings, the difference between parameters and other variables soon fades, so one started to you use both notations intermixedly.
So, in essence: there is no actual difference: they both indicate the conditional distribution of the thing on the left, conditional on the thing(s) on the right.
例子:
P(y=1|x;θ) 是 给定x,θ的条件下y=1的概率,分号用于区别参数
这个的意思是:当参数Θ=θ时,X=x的概率依赖于参数的x的分布或者概率密度p(x;θ)
p(x;2),就是当参数是2的时候,X=x的概率
比方说:
10个球,其中θ个1球,10-θ个0球
从中取一个球,
p(X|θ)=xθ/10+(1-x) (10-θ)/10
就是 x=1,p=θ/10
x=0,p=(10-θ)/10
θ不同,同样x值的概率随之变动。