python概率编程
Introduction
介绍
We often hear something like this on weather forecast programs: the chance of raining tomorrow is 80%. What does that mean? It is often hard to give meaning to this kind of statement, especially from a frequentist perspective: there is no reasonable way to repeat the raining/not raining experiment an infinite (or very big) number of times.
我们经常在天气预报节目中听到这样的消息:明天下雨的机会是80%。 那是什么意思? 通常很难对这种陈述赋予意义,尤其是从常客的角度:没有合理的方法可以无限次(或非常大)地重复下雨/不下雨实验。
The Bayesian approach provides a solution for this type of statement. The following sentence, taken from the book Probabilistic Programming & Bayesian Methods for Hackers, perfectly summarizes one of the key ideas of the Bayesian perspective.
贝叶斯方法为这种陈述提供了一种解决方案。 以下句子摘自《 概率编程与黑客贝叶斯方法 》一书,完美地总结了贝叶斯观点的关键思想之一。
The Bayesian world-view interprets probability as measure of believability in an event, that is, how confident we are in an event occurring.
贝叶斯世界观将概率解释为事件中可信度的度量,即我们对事件发生的信心。
In other words, in the Bayesian approach, we can never be absolutely sure about our *beliefs*, but can definitely say how confident we are about the relevant events. Furthermore, as more data is collected, we can become more confident about our beliefs.
换句话说,在贝叶斯方法中,我们永远无法绝对确定自己的“信念”,但可以肯定地说出我们对相关事件的信心。 此外,随着收集到更多数据,我们可以对自己的信念更加自信。
As a scientist, I am trained to believe in the data and always be critical about almost everything. Naturally, I find Bayesian inference to be rather intuitive.
作为一名科学家,我受过训练以相信数据,并且对几乎所有事物都至关重要。 自然,我发现贝叶斯推理是相当直观的。
However, it is often computationally and conceptually challenging to work with Bayesian inference. Often, a lot of long and complicated mathematical computations are required to get things done. Even as a mathematician, I occasionally find these computations tedious; especially when I need a quick overview of the problem that I want to solve.
但是,使用贝叶斯推断在计算和概念上通常具有挑战性。 通常,完成工作需要大量漫长而复杂的数学计算。 即使作为数学家,我有时也会发现这些计算很乏味; 特别是当我需要快速了解要解决的问题时。
Luckily, my mentor Austin Rochford recently introduced me to a wonderful package called PyMC3 that allows us to do numerical Bayesian inference. In this article, I will give a quick introduction to PyMC3 through a concrete example.
幸运的是,我的导师Austin Rochford最近向我介绍了一个名为PyMC3的出色程序包,该程序包使我们能够进行数值贝叶斯推理。 在本文中,我将通过一个具体示例快速介绍PyMC3。
A concrete example
一个具体的例子
Let’s assume that we have a coin. We flip it three times and the result is:
假设我们有一枚硬币。 我们将其翻转三遍,结果是:
[0, 1, 1]
[0,1,1]
where 0 means that the coin lands in a tail and 1 means that the coin lands in a head. Are we confident in saying that this is a fair coin? In other words, if we let θ be the probability that the coin will return the head, is the evidence strong enough to support the statement that θ=12?
其中0表示硬币落在尾巴上,1表示硬币落在尾巴上。 我们是否有信心说这是一个公平的硬币? 换句话说,如果让θ为硬币返回头部的概率,那么证据是否足以支持θ= 12的说法?
Well, as we do not know anything about the coin other than the result of the above experiment, it is hard to say anything for sure. From the frequentist-perspective, a point estimation for θ would be
好吧,由于除了上述实验的结果外,我们对硬币一无所知,因此很难确定地说什么。 从常客的角度来看,θ的点估计为
While this number makes sense, the frequentist approach does not really provide a certain level of confidence about it. In particular, if we do more trials, we are likely to get different point estimations for θ.
尽管这个数字是合理的,但常客主义的方法并不能真正为其提供一定的信心。 特别是,如果我们进行更多试验,我们很可能会获得不同的θ点估计。
This is where the Bayesian approach could offer some improvement. The idea is simple, as we do not know anything about θ, we can assume that θ could be any value on [0,1]. Mathematically, our prior belief is that θ follows a Uniform(0,1) distribution. For those who need a refresh in maths, the pdf of Uniform(0,1) is given by
这是贝叶斯方法可以提供一些改进的地方。 这个想法很简单,因为我们对θ一无所知,所以我们可以假设θ可以是[0,1]上的任何值。 在数学上,我们的先验信念是θ遵循Uniform(0,1)分布。 对于那些需要重新学习数学的人,Uniform(0,1)的pdf由下式给出:
We can then use evidence/our observations to update our belief about the distribution of θ.
然后,我们可以使用证据/我们的观察来更新关于θ分布的信念。
Let us formally call D to be the evidence (in our case, it is the result of our coin toss.) By the Bayesian rule, the posterior distribution is computed by
让我们正式称D为证据(在我们的例子中,这是抛硬币的结果。)根据贝叶斯规则,后验分布可通过以下公式计算:
where p(D|θ) is the likelihood function, p(θ) is the prior distribution (Uniform(0,1) in this case.) There are two ways to go from here.
其中p(D |θ)是似然函数,p(θ)是先验分布(在这种情况下,是Uniform(0,1)。)从这里有两种方法。
The explicit approach
显式方法
In this particular example, we can do everything by hand. More precisely, given θ, the probability that we get 2 heads out of three coin tosses is given by
在此特定示例中,我们可以手动完成所有操作。 更准确地说,给定θ,我们从三个抛硬币中得到2个头的概率为
By assumption, p(θ)=1. Next, we evaluate the dominator
通过假设,p(θ)= 1。 接下来,我们评估支配者
By some simple algebra, we can see that the above integral is equal to 1/4 and hence
通过一些简单的代数,我们可以看到上述积分等于1/4,因此
Remark: By the same computation, we can also see that if the prior distribution of θ is a Beta distribution with parameters α,β, i.e p(θ)=B(α,β), and the sample size is N with k of them are head, then the posterior distribution of θ is given by B(α+k,β+N−k). In our case, α=β=1,N=3,k=2.
注意:通过相同的计算,我们还可以看到,如果θ的先验分布是参数为α,β的Beta分布 ,即p(θ)= B(α,β),并且样本大小为N,且k为它们是头部,则θ的后验分布由B(α+ k,β+ NK)给出。 在我们的情况下,α=β= 1,N = 3,k = 2。
The numerical approach
数值方法
In the explicit approach, we are able to explicitly compute the posterior distribution of θ by using conjugate priors. However, sometimes conjugate priors are used for computational simplicity and they might not reflect the reality. Furthermore, it is not always feasible to find conjugate priors.
在显式方法中,我们能够使用共轭先验显式计算θ的后验分布。 但是,有时使用共轭先验来简化计算,它们可能无法反映现实。 此外,找到共轭先验并不总是可行的。
We can overcome this problem by using the Markov Chain Monte Carlo (MCMC) method to approximate the posterior distributions. The math here is pretty beautiful but for the sole purpose of this article, we will not dive into it. Instead, we will explain how to implement this method using PyMC3.
我们可以通过使用马尔可夫链蒙特卡洛 (MCMC)方法来近似后验分布来克服此问题。 这里的数学很漂亮,但是出于本文的唯一目的,我们不会深入探讨。 相反,我们将解释如何使用PyMC3实现此方法。
To run our codes, we import the following packages.
要运行我们的代码,我们导入以下软件包。
import pymc3 as pm
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from IPython.core.pylabtools import figsize
First, we need to initiate the prior distribution for θ. In PyMC3, we can do so by the following lines of code.
首先,我们需要启动θ的先验分布。 在PyMC3中,我们可以通过以下代码行来实现。
with pm.Model() as model:
theta=pm.Uniform('theta', lower=0, upper=1)
We then fit our model with the observed data. This can be done by the following lines of code.
然后,我们将模型与观测数据拟合。 这可以通过以下代码行完成。
occurrences=np.array([1,1,0]) #our observation
with model:
obs=pm.Bernoulli("obs", p, observed=occurrences) #input the observations
step=pm.Metropolis()
trace=pm.sample(18000, step=step)
burned_trace=trace[1000:]
Internally, PyMC3 uses the Metropolis-Hastings algorithm to approximate the posterior distribution. The trace function determines the number of samples withdrawn from the posterior distribution. Finally, as the algorithm might be unstable at the beginning, it is useful to only withdraw samples after a certain period of iterations. That is the purpose of the last line in our code.
在内部,PyMC3使用Metropolis-Hastings算法来近似后验分布。 跟踪函数确定从后验分布中抽取的样本数。 最后,由于该算法在开始时可能不稳定,因此仅在经过一定的迭代周期后才提取样本很有用。 这就是我们代码最后一行的目的。
We can then plot the histogram of our samples obtained from the posterior distribution and compare it with the true density function.
然后,我们可以绘制从后验分布获得的样本的直方图,并将其与真实密度函数进行比较。
from IPython.core.pylabtools import figsize
p_true=0.5
figsize(12.5, 4)
plt.title(r"Posterior distribution of $\theta$")
plt.vlines(p_true,0, 2, linestyle='--', label=r"true $\theta$ (unknown)", color='red')
plt.hist(burned_trace["theta"], bins=25, histtype='stepfilled', density=True, color='#348ABD')
x=np.arange(0,1.04,0.04)
plt.plot(x, 12*x*x*(1-x), color='black')
plt.legend()
plt.show()
As we can clearly see, the numerical approximation is pretty close to the true posterior distribution.
我们可以清楚地看到,数值逼近非常接近真实的后验分布。
What happens if we increase the sample size?
如果我们增加样本量会怎样?
As we mentioned earlier, the more data we get, the more confident we are about the true value of θ. Let us test our hypothesis by a simple simulation.
如前所述,获得的数据越多,我们对θ的真实值的信心就越大。 让我们通过一个简单的模拟来检验我们的假设。
We will randomly toss a coin 1000 times. We then use PyMC3 to approximate the posterior distribution of θ. We then plot the histogram of samples obtained from this distribution. All of these steps can be done by the following lines of code
我们将随机抛硬币1000次。 然后,我们使用PyMC3估算θ的后验分布。 然后,我们绘制从该分布获得的样本的直方图。 所有这些步骤可以通过以下代码行完成
N=1000 #the number of samples
occurences=np.random.binomial(1, p=0.5, size=N)
k=occurences.sum() #the number of head#fit the observed data
with pm.Model() as model1:
theta=pm.Uniform('theta', lower=0, upper=1)with model1:
obs=pm.Bernoulli("obs", theta, observed=occurrences)
step=pm.Metropolis()
trace=pm.sample(18000, step=step)
burned_trace1=trace[1000:]
#plot the posterior distribution of theta. p_true=0.5
figsize(12.5, 4)
plt.title(r"Posterior distribution of $\theta for sample size N=1000$")
plt.vlines(p_true,0, 25, linestyle='--', label="true $\theta$ (unknown)", color='red')
plt.hist(burned_trace1["theta"], bins=25, histtype='stepfilled', density=True, color='#348ABD')
plt.legend()
plt.show()
Here is what we get.
这就是我们得到的。
As we can see, the posterior distribution is now centered around the true value of θ.
如我们所见,后验分布现在以θ的真实值为中心。
We can estimate θ by taking the mean of our samples.
我们可以通过取样本平均值来估算θ。
burned_trace1['theta'].mean()0.4997847718651745
We see that this is really close to the true answer.
我们看到这确实接近真实答案。
Conclusion
结论
As we can see, PyMC3 performs statistical inference tasks pretty well. Furthermore, it makes probabilistic programming rather painless.
如我们所见,PyMC3可以很好地执行统计推断任务。 此外,它使概率编程变得相当轻松。
References.
参考文献。
[1] https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
[1] https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
python概率编程
本文介绍了pymc3,一个用于概率编程的Python包,它在机器学习和人工智能领域中扮演着重要角色,支持利用Tensorflow进行复杂模型的构建和推理。
472

被折叠的 条评论
为什么被折叠?



