#Python算法实现-贝叶斯模型-分层模型

贝叶斯模型的一个核心优势就是简单灵活,可以实现一个分层模型。这一节将实现和比较整体合并模型和局部融合模型。

import itertools

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import pymc3 as pm

import scipy

import scipy.stats as stats

import seaborn.apionly as sns

from IPython.display import Image

from sklearn import preprocessing

%matplotlib inline

plt.style.use('bmh')

colors = ['#348ABD', '#A60628', '#7A68A6', '#467821', '#D55E00', 

          '#CC79A7', '#56B4E9', '#009E73', '#F0E442', '#0072B2']

messages = pd.read_csv('data/hangout_chat_data.csv')

模型合并

让我们采取一种不同的方式来对我的 hangout 聊天回复时间进行建模。我的直觉告诉我回复的快慢与聊天的对象有关。我很可能回复女朋友比回复一个疏远的朋友更快。这样,我可以对每个对话独立建模,对每个对话i估计参数 μi 和 αi。μi=Uniform(0,100)

一个必须考虑的问题是,有些对话相比其他的含有的消息很少。这样,和含有大量消息的对话相比,我们对含有较少消息的对话的回复时间的估计,就具有较大的不确定度。下图表明了每个对话在样本容量上的差异。

ax = messages.groupby('prev_sender')['conversation_id'].size().plot(

    kind='bar', figsize=(12,3), title='Number of messages sent per recipient', color=colors[0])

_ = ax.set_xlabel('Previous Sender')

_ = ax.set_ylabel('Number of messages')

_ = plt.xticks(rotation=45)

640?wx_fmt=png

对于每个对话i中的每条消息j,模型可以表示为:

640?wx_fmt=png

indiv_traces = {}

# Convert categorical variables to integer

le = preprocessing.LabelEncoder()

participants_idx = le.fit_transform(messages['prev_sender'])

participants = le.classes_

n_participants = len(participants)

for p in participants:

    with pm.Model() as model:

        alpha = pm.Uniform('alpha', lower=0, upper=100)

        mu = pm.Uniform('mu', lower=0, upper=100)

        

        data = messages[messages['prev_sender']==p]['time_delay_seconds'].values

        y_est = pm.NegativeBinomial('y_est', mu=mu, alpha=alpha, observed=data)

        y_pred = pm.NegativeBinomial('y_pred', mu=mu, alpha=alpha)

        

        start = pm.find_MAP()

        step = pm.Metropolis()

        trace = pm.sample(20000, step, start=start, progressbar=True)

        

        indiv_traces[p] = trace

Applied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 9.7 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.4 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.0 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.0 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 10.3 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 16.4 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.1 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 17.4 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 19.9 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 15.2 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 16.1 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 10.1 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 11.1 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 11.9 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.8 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 13.0 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 12.4 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 10.9 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 22.6 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 18.7 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 13.1 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 13.9 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 14.0 secApplied interval-transform to alpha and added transformed alpha_interval to model.

Applied interval-transform to mu and added transformed mu_interval to model.

 [-----------------100%-----------------] 20000 of 20000 complete in 13.6 sec

fig, axs = plt.subplots(3,2, figsize=(12, 6))

axs = axs.ravel()

y_left_max = 2

y_right_max = 2000

x_lim = 60

ix = [3,4,6]

for i, j, p in zip([0,1,2], [0,2,4], participants[ix]):

    axs[j].set_title('Observed: %s' % p)

    axs[j].hist(messages[messages['prev_sender']==p]['time_delay_seconds'].values, range=[0, x_lim], bins=x_lim, histtype='stepfilled')

    axs[j].set_ylim([0, y_left_max])

for i, j, p in zip([0,1,2], [1,3,5], participants[ix]):

    axs[j].set_title('Posterior predictive distribution: %s' % p)

    axs[j].hist(indiv_traces[p].get_values('y_pred'), range=[0, x_lim], bins=x_lim, histtype='stepfilled', color=colors[1])

    axs[j].set_ylim([0, y_right_max])

axs[4].set_xlabel('Response time (seconds)')

axs[5].set_xlabel('Response time (seconds)')

plt.tight_layout()

640?wx_fmt=png

上图显示了3例对话的观测数据分布(左)和后验预测分布(右)。可以看出,不同对话的后验预测分布大不相同。这可能准确反映出对话的特点,也可能是样本容量太小造成的误差。

如果我们把后验预测分布联合起来,我们希望得到的分布和观测数据分布近似,让我们来进行后验预测检验。

combined_y_pred = np.concatenate([v.get_values('y_pred') for k, v in indiv_traces.items()])

x_lim = 60

y_pred = trace.get_values('y_pred')

fig = plt.figure(figsize=(12,6))

fig.add_subplot(211)

fig.add_subplot(211)

_ = plt.hist(combined_y_pred, range=[0, x_lim], bins=x_lim, histtype='stepfilled', color=colors[1])   

_ = plt.xlim(1, x_lim)

_ = plt.ylim(0, 20000)

_ = plt.ylabel('Frequency')

_ = plt.title('Posterior predictive distribution')

fig.add_subplot(212)

_ = plt.hist(messages['time_delay_seconds'].values, range=[0, x_lim], bins=x_lim, histtype='stepfilled')

_ = plt.xlim(0, x_lim)

_ = plt.xlabel('Response time in seconds')

_ = plt.ylim(0, 20)

_ = plt.ylabel('Frequency')

_ = plt.title('Distribution of observed data')

plt.tight_layout()

640?wx_fmt=png

是的,后验预测分布和观测数据的分布近似。但是,我关心的是数据较少的对话,对它们的估计也可能具有较大的方差。一个减小这种风险的方法是分享对话信息,但仍对每个对话单独估计 μi,我们称之为局部融合。

局部融合

就像整体合并模型,局部融合模型对每个对话i都有独自的参数估计。但是,这些参数通过融合参数联系在一起。这反映出,我的不同对话间的response_time有相似之处,就是我本性上倾向于回复快或慢。

640?wx_fmt=png

接着上面的例子,估计负二项分布的参数 μi 和 αi。相比于使用均匀分布作为先验分布,我使用带两个参数(μ,σ)的伽马分布,从而当能够预测 μ 和 σ 的值时,可以引入更多的先验信息到模型中。

首先,我们来看看伽马分布,下面你可以看到,它非常灵活。

mu = [5,25,50]

sd = [3,7,2]

plt.figure(figsize=(11,3))

_ = plt.title('Gamma distribution')

with pm.Model() as model:

    for i, (j, k) in enumerate(zip(mu, sd)):

        samples = pm.Gamma('gamma', mu=j, sd=k).random(size=10**6)

        plt.hist(samples, bins=100, range=(0,60), color=colors[i], alpha=1)

_ = plt.legend(['$mu$ = %s, $sigma$ = %s' % (mu[a], sd[a]) for a in [0,1,2]])

Applied log-transform to gamma and added transformed gamma_log to model.

Applied log-transform to gamma and added transformed gamma_log to model.

Applied log-transform to gamma and added transformed gamma_log to model.

640?wx_fmt=png

局部融合模型可以表示为:

640?wx_fmt=png

640?wx_fmt=png

with pm.Model() as model:

    hyper_alpha_sd = pm.Uniform('hyper_alpha_sd', lower=0, upper=50)

    hyper_alpha_mu = pm.Uniform('hyper_alpha_mu', lower=0, upper=10)

    

    hyper_mu_sd = pm.Uniform('hyper_mu_sd', lower=0, upper=50)

    hyper_mu_mu = pm.Uniform('hyper_mu_mu', lower=0, upper=60)

    

    alpha = pm.Gamma('alpha', mu=hyper_alpha_mu, sd=hyper_alpha_sd, shape=n_participants)

    mu = pm.Gamma('mu', mu=hyper_mu_mu, sd=hyper_mu_sd, shape=n_participants)

    

    y_est = pm.NegativeBinomial('y_est', 

                                mu=mu[participants_idx], 

                                alpha=alpha[participants_idx], 

                                observed=messages['time_delay_seconds'].values)

    

    y_pred = pm.NegativeBinomial('y_pred', 

                                 mu=mu[participants_idx], 

                                 alpha=alpha[participants_idx],

                                 shape=messages['prev_sender'].shape)

    

    start = pm.find_MAP()

    step = pm.Metropolis()

    hierarchical_trace = pm.sample(200000, step, progressbar=True)

Applied interval-transform to hyper_alpha_sd and added transformed hyper_alpha_sd_interval to model.

Applied interval-transform to hyper_alpha_mu and added transformed hyper_alpha_mu_interval to model.

Applied interval-transform to hyper_mu_sd and added transformed hyper_mu_sd_interval to model.

Applied interval-transform to hyper_mu_mu and added transformed hyper_mu_mu_interval to model.

Applied log-transform to alpha and added transformed alpha_log to model.

Applied log-transform to mu and added transformed mu_log to model.

 [-----------------100%-----------------] 200000 of 200000 complete in 593.0 sec

_ = pm.traceplot(hierarchical_trace[120000:], 

                 varnames=['mu','alpha','hyper_mu_mu',

                           'hyper_mu_sd','hyper_alpha_mu',

                           'hyper_alpha_sd'])

640?wx_fmt=png

对 μ 和 α 的估计有多条曲线,每个对话i对应一条。整体合并和局部融合模型的不同之处在于,局部融合模型的参数(μ和 α)拥有一个被所有对话共享的融合参数。这带来两个好处:

  1. 信息在对话间共享,所以对于含有有限样本容量的对话来说,在估计过程中从别的对话处“借”信息来减小估计的方差。

  2. 我们对每个对话单独做了估计,也对所有对话整体做了估计。

我们快速看一下后验预测分布

x_lim = 60

y_pred = hierarchical_trace.get_values('y_pred')[::1000].ravel()

fig = plt.figure(figsize=(12,6))

fig.add_subplot(211)

fig.add_subplot(211)

_ = plt.hist(y_pred, range=[0, x_lim], bins=x_lim, histtype='stepfilled', color=colors[1])   

_ = plt.xlim(1, x_lim)

_ = plt.ylabel('Frequency')

_ = plt.title('Posterior predictive distribution')

fig.add_subplot(212)

_ = plt.hist(messages['time_delay_seconds'].values, range=[0, x_lim], bins=x_lim, histtype='stepfilled')

_ = plt.xlabel('Response time in seconds')

_ = plt.ylabel('Frequency')

_ = plt.title('Distribution of observed data')

plt.tight_layout()

640?wx_fmt=png

收缩效果:合并模型 vs 分层模型

如讨论的那样,局部融合模型中 μ 和 α 享有一个融合参数,通过对话间信息共享,它使得参数的估计收缩得更紧密,尤其是对含有少量数据的对话。

下图显示了这种收缩效果,可以看出通过融合参数,参数μ 和 α 是怎样聚集在一起。

hier_mu = hierarchical_trace['mu'][500:].mean(axis=0)

hier_alpha = hierarchical_trace['alpha'][500:].mean(axis=0)

indv_mu = [indiv_traces[p]['mu'][500:].mean() for p in participants]

indv_alpha = [indiv_traces[p]['alpha'][500:].mean() for p in participants]

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, xlabel='mu', ylabel='alpha', 

                     title='Pooled vs. Partially Pooled Negative Binomial Model', 

                     xlim=(5, 45), ylim=(0, 10))

ax.scatter(indv_mu, indv_alpha, c=colors[5], s=50, label = 'Pooled', zorder=3)

ax.scatter(hier_mu, hier_alpha, c=colors[6], s=50, label = 'Partially Pooled', zorder=4)

for i in range(len(indv_mu)):  

    ax.arrow(indv_mu[i], indv_alpha[i], hier_mu[i] - indv_mu[i], hier_alpha[i] - indv_alpha[i], 

            fc="grey", ec="grey", length_includes_head=True, alpha=.5, head_width=0)

_ = ax.legend()

640?wx_fmt=png

对后验分布提问

我们开始利用贝叶斯统计最好的方面之一——后验分布。不像频率论方法,我们得到完整的后验分布而不是点估计。本质上,我们是得到许多可信的参数值,这使得我们可以按照比较自然直观的方式提问。

What are the chances I’ll respond to my friend in less than 10 seconds?

为了估计这个概率,我们可以看看 Timothy 和 Andrew 的回复时间的后验预测分布,检查有多少样本是小于 10 秒的。当我第一次听到这个方法时,我以为我理解错了,因为它太简单了。

def participant_y_pred(person):

    """Return posterior predictive for person"""

    ix = np.where(participants == person)[0][0]

    return hierarchical_trace['y_pred'][100000:, ix]

print("Here are some samples from Timothy's posterior predictive distribution: n %s" % participant_y_pred('Timothy'))

Here are some samples from Timothy's posterior predictive distribution: 

 [24 24 24 ..., 19 19 19]

def person_plotA(person_name):

    ix_check = participant_y_pred(person_name) > 10

    _ = plt.hist(participant_y_pred(person_name)[~ix_check], range=[0, x_lim], bins=x_lim, histtype='stepfilled', label='<10 seconds')

    _ = plt.hist(participant_y_pred(person_name)[ix_check], range=[0, x_lim], bins=x_lim, histtype='stepfilled', label='>10 seconds')

    _ = plt.title('Posterior predictive ndistribution for %s' % person_name)

    _ = plt.xlabel('Response time')

    _ = plt.ylabel('Frequency')

    _ = plt.legend()

def person_plotB(person_name):

    x = np.linspace(1, 60, num=60)

    num_samples = float(len(participant_y_pred(person_name)))

    prob_lt_x = [100*sum(participant_y_pred(person_name) < i)/num_samples for i in x]

    _ = plt.plot(x, prob_lt_x, color=colors[4])

    _ = plt.fill_between(x, prob_lt_x, color=colors[4], alpha=0.3)

    _ = plt.scatter(10, float(100*sum(participant_y_pred(person_name) < 10))/num_samples, s=180, c=colors[4])

    _ = plt.title('Probability of responding nto %s before time (t)' % person_name)

    _ = plt.xlabel('Response time (t)')

    _ = plt.ylabel('Cumulative probability t')

    _ = plt.ylim(ymin=0, ymax=100)

    _ = plt.xlim(xmin=0, xmax=60)

fig = plt.figure(figsize=(11,6))

_ = fig.add_subplot(221)

person_plotA('Timothy')

_ = fig.add_subplot(222)

person_plotB('Timothy')

_ = fig.add_subplot(223)

person_plotA('Andrew')

_ = fig.add_subplot(224)

person_plotB('Andrew')

plt.tight_layout()

640?wx_fmt=png

participant_y_pred('Andrew')

array([13, 13, 13, ..., 28, 28, 28])

我发现这个方法非常直观而且灵活。上图左边根据大于 10 秒或小于 10 秒把后验预测的样本分为两部分,通过计算小于 10 的样本比例来计算概率。右边的图对回复时间小于每个 0 到 60 间的值的概率进行了计算。所以,看起来,Timothy 和 Andrew 分别有 3% 和 4% 的可能性在 10 秒内得到回复。

我的朋友们两两之间比较如何呢?

def prob_persona_faster(persona, personb):

    return sum(participant_y_pred(persona) < participant_y_pred(personb))/len(participant_y_pred(persona))

print("Probability that Tom is responded to faster than Andrew: {:.2%}".format(prob_persona_faster('Tom', 'Andrew')))

640?wx_fmt=png

# Create an empty dataframe

ab_dist_df = pd.DataFrame(index=participants, columns=participants, dtype=np.float)

# populate each cell in dataframe with persona_less_personb()

for a, b in itertools.permutations(participants, 2):

    ab_dist_df.ix[a, b] = prob_persona_faster(a, b)

    

# populate the diagonal

for a in participants:

    ab_dist_df.ix[a, a] = 0.5

# Plot heatmap

f, ax = plt.subplots(figsize=(12, 9))

cmap = plt.get_cmap("Spectral")

_ = sns.heatmap(ab_dist_df, square=True, cmap=cmap)

_ = plt.title('Probability that Person A will be responded to faster than Person B')

_ = plt.ylabel('Person A')

_ = plt.xlabel('Person B')

640?wx_fmt=png

—End—

  • 1
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值