Gibbs sampling based on linear regression

最新推荐文章于 2022-03-20 16:27:47 发布

Moriarty_smile

最新推荐文章于 2022-03-20 16:27:47 发布

阅读量267

点赞数 2

分类专栏： python

本文链接：https://blog.csdn.net/daiyongya/article/details/80684141

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Bayesian linear regression

此实例为吉布斯采样基于线性回归的实现。假设我们已经知道实数对 $(y_{i},x_{i}),i=1,\cdots ,N$ ,以及 $y_{i}\sim \mathcal{N}(\beta _{0}+\beta_{1}x_{i},1/\tau )$ ,根据分布的性质，也可以写成另外一种形式， $y_{i}=\beta _{0}+\beta_{1}x_{i}+\epsilon,\epsilon\sim \mathcal{N}(0,1/\tau )$ ,假设观察到的实数对相互独立的产生于同一个分布，其似然函数可以写成：

L (y 1, \dots, y n, x 1, \dots, x n ∖ β 0, β 1, τ) = \prod i = 1 N N (β 0 + β 1 x i, 1 / τ)

$L(y_{1},\cdots ,y_{n},x_{1},\cdots,x_{n}\setminus \beta _{0},\beta_{1},\tau)=\prod_{i=1}^{N}\mathcal{N}(\beta_{0}+\beta_{1}x_{i},1/\tau)$
为了便于计算，我们对参数取共轭先验：

β 0 \sim N (μ 0, 1 / τ 0) β 1 \sim N (μ 1, 1 / τ 1) τ \sim G a m m a (α, β)

$\beta_{0}\sim \mathcal{N}(\mu _{0},1/\tau_{0})\\\beta_{1}\sim \mathcal{N}(\mu _{1},1/\tau_{1})\\\tau \sim Gamma(\alpha ,\beta )$
这里有详细的slices推导，请点击

Gibbs基本方法介绍

我们有两个参数 $\theta_{1},\theta_{2}$ ，以及观测数据 $x$ ，为了找到后验概率 $p(\theta_{1},\theta_{2}\setminus x)$ ，我们做以下操作：

1.以一定方法初始化 $\theta_{2}^{i}$ (比如随机初始化)

2.此时根据已知采样 $\theta_{1}^{i+1}\sim p(\theta_{1}\setminus\theta_{2}^{i}, x)$

3.根据最新的值再采样 $\theta_{2}^{i+1}\sim p(\theta_{2}\setminus\theta_{1}^{i+1}, x)$

需要注意的是，需要交替的以最新值就行采样更新。

程序实现

导入包

import numpy as np
import matplotlib as plt
%matplotlib inline
import seaborn as sns
import pandas as pd

根据推导公式定义更新参数的函数

def sample_beta_0(y, x, beta_1, tau, mu_0, tau_0):
    N = len(y)
    assert len(x) == N
    precision = tau_0 + tau * N
    mean = tau_0 * mu_0 + tau * np.sum(y - beta_1 * x)
    mean /= precision
    return np.random.normal(mean, 1 / np.sqrt(precision))

def sample_beta_1(y, x, beta_0, tau, mu_1, tau_1):
    N = len(y)
    assert len(x) == N
    precision = tau_1 + tau * np.sum(x * x)
    mean = tau_1 * mu_1 + tau * np.sum( (y - beta_0) * x)
    mean /= precision
    return np.random.normal(mean, 1 / np.sqrt(precision))

def sample_tau(y, x, beta_0, beta_1, alpha, beta):
    N = len(y)
    alpha_new = alpha + N / 2
    resid = y - beta_0 - beta_1 * x
    beta_new = beta + np.sum(resid * resid) / 2
    return np.random.gamma(alpha_new, 1 / beta_new)

产生数据

##设置数据参数的真实参数
beta_0_true = -1
beta_1_true = 2
tau_true = 1
##根据真实参数生成数据
N = 50
x = np.random.uniform(low = 0, high = 4, size = N)
y = np.random.normal(beta_0_true + beta_1_true * x, 1 / np.sqrt(tau_true))

sns.stripplot(x,y)#画出数据

这里写图片描述

<matplotlib.axes._subplots.AxesSubplot at 0x27826bb4c50>

## specify initial values
init = {"beta_0": 0,
        "beta_1": 0,
        "tau": 2}

## specify hyper parameters
hypers = {"mu_0": 0,
         "tau_0": 1,
         "mu_1": 0,
         "tau_1": 1,
         "alpha": 2,
         "beta": 1}

##定义Gibbs采样的主函数，把上面定义的几个函数按照逻辑组合起来形成一个采样过程
def gibbs(y, x, iters, init, hypers):
    assert len(y) == len(x)
    beta_0 = init["beta_0"]
    beta_1 = init["beta_1"]
    tau = init["tau"]

    trace = np.zeros((iters, 3)) ## trace to store values of beta_0, beta_1, tau
##按照更新过程进行参数逐一更新
    for it in range(iters):
        beta_0 = sample_beta_0(y, x, beta_1, tau, hypers["mu_0"], hypers["tau_0"])
        beta_1 = sample_beta_1(y, x, beta_0, tau, hypers["mu_1"], hypers["tau_1"])
        tau = sample_tau(y, x, beta_0, beta_1, hypers["alpha"], hypers["beta"])
        trace[it,:] = np.array((beta_0, beta_1, tau))

    trace = pd.DataFrame(trace)
    trace.columns = ['beta_0', 'beta_1', 'tau']

    return trace

iters = 1000##采样次数
trace = gibbs(y, x, iters, init, hypers)

trace.plot()##采样得到的数据可视化

<matplotlib.axes._subplots.AxesSubplot at 0x27826c3c908>

这里写图片描述

trace_burnt = trace[500:999]##取后面500个采样
hist_plot = trace_burnt.hist(bins = 30, layout = (1,3))

这里写图片描述

print(trace_burnt.median())
print(trace_burnt.std())

beta_0 -1.077110 beta_1 1.903947 tau 1.384544 dtype: float64 beta_0 0.229955 beta_1 0.093329 tau 0.288980 dtype: float64

trace[1:10]

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	beta_0	beta_1	tau
1	1.317167	1.051728	0.445828
2	0.594940	1.256017	0.513077
3	0.134436	1.415097	0.769414
4	-0.123197	1.519084	0.968624
5	-0.073099	1.567300	0.633794
6	-0.525161	1.757552	0.974104
7	-0.843614	1.878984	1.751914
8	-0.876359	1.770635	1.376143
9	-0.890558	1.839191	1.723694

此为纯翻译摘抄版本，英文参见Gibbs sampling for Bayesian linear regression in Python

Moriarty_smile

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Gibbs sampling based on linear regression

## Bayesian linear regression此实例为吉布斯采样基于线性回归的实现。假设我们已经知道实数对(yi,xi),i=1,⋯,N(yi,xi),i=1,⋯,N(y_{i},x_{i}),i=1,\cdots ,N,以及yi∼N(β0+β1xi,1/τ)yi∼N(β0+β1xi,1/τ)y_{i}\sim \mathcal{N}(\beta _{0}+\beta_{1}x_{i...
复制链接

扫一扫