给定概率密度，生成随机数 python实现

最新推荐文章于 2024-05-30 09:50:36 发布

normol

最新推荐文章于 2024-05-30 09:50:36 发布

阅读量1.4w

点赞数 11

分类专栏： Python 文章标签：概率密度随机数 python

本文链接：https://blog.csdn.net/normol/article/details/98968528

版权

Python 专栏收录该内容

17 篇文章 5 订阅

订阅专栏

实现的方法可以不止一种：

rejection sampling
invert the cdf
Metropolis Algorithm (MCMC)

本篇介绍根据累积概率分布函数的逆函数(2：invert the CDF)生成的方法。
原参考网站：https://blog.demofox.org/2017/08/05/generating-random-numbers-from-a-specific-distribution-by-inverting-the-cdf/
自己的理解不一定正确，有错误望指正。

目标：
已知 y=pdf(x），现想由给定的pdf, 生成对应分布的x
PDF是概率分布函数，对其积分或者求和可以得到CDF（累积概率分布函数），PDF积分或求和的结果始终为1

步骤（具体解释后面会说）：
1、根据pdf得到cdf
2、由cdf得到inverse of the cdf
3、对于给定的均匀分布[0,1),带入inverse cdf，得到的结果即是我们需要的x

求cdf逆函数的具体方法：
对于上面的第二步，可以分成两类：
1、当CDF的逆函数好求时，直接根据公式求取，
2、反之当CDF的逆函数不好求时，用数值模拟方法

自己的理解：为什么需要根据cdf的逆去获得x？
原因一：
因为cdf是单调函数因此一定存在逆函数（cdf是s型函数，而pdf则不一定，例如正态分布，不单调，对于给定的y，可能存在两个对应的x，就不可逆）
原因二：
这仅是我自己的直观理解，根据下图所示（左上为pdf，右上为cdf）
在这里插入图片描述
由步骤3可知，我们首先生成[0，1)的均匀随机数，此随机数作为cdf的y，去映射到cdf的x（若用cdf的逆函数表示则是由x映射到y），可以参考上图的右上，既然cdf的y是均匀随机的，那么对于cdf中同样范围的x，斜率大的部分将会有更大的机会被映射，因为对应的y范围更大（而y是随即均匀分布的），那么，cdf的斜率也就等同于pdf的值，这正好符合若x的pdf较大，那么有更大的概率出现（即重复很多次后，该x会出现的次数最多）

代码实现——方法一，公式法

import numpy as np
import math
import random
import matplotlib.pyplot as plt
import collections

count_dict = dict()
bin_count = 20

def inverseCDF():
    """
    return the x value in PDF
    """
    uniform_random = random.random()
    return inverse_cdf(uniform_random)
    

def pdf(x):
    return 2 * x
    
# cdf = x^2, 其逆函数很好求，因此直接用公式法
def inverse_cdf(x):
    return math.sqrt(x)


def draw_pdf(D):
	global bin_count
    D = collections.OrderedDict(sorted(D.items()))
    plt.bar(range(len(D)), list(D.values()), align='center')
    # 因为映射bin的时候采用的floor操作，因此加上0.5
    value_list = [(key + 0.5) / bin_count for key in D.keys()]
    plt.xticks(range(len(D)), value_list)
    plt.xlabel('x', fontsize=5)
    plt.ylabel('counts', fontsize=5)
    plt.title('counting bits')
    plt.show()

for i in range(90000):
    x = inverseCDF()
    # 用bin去映射，否则不好操作
    bin = math.floor(x * bin_count)  # type(bin): int
    count_dict[bin] = count_dict.get(bin, 0) + 1

draw_pdf(count_dict)

结果：
在这里插入图片描述
代码实现——方法二，数值法
数值模拟cdf的关键是创建lookup table，
table的size越大则结果越真实（即区间划分的个数）

import numpy as np
import math
import random
import matplotlib.pyplot as plt
import collections

lookup_table_size = 40
CDFlookup_table = np.zeros((lookup_table_size))

count_dict = dict()
bin_count = 20

def inverse_cdf_numerically(y):
    global lookup_table_size
    global CDFlookup_table
    value = 0.0
    for i in range(lookup_table_size):
        x = i * 1.0 / (lookup_table_size - 1)
        value += pdf2(x)
        CDFlookup_table[i] = value
    CDFlookup_table /= value  # normalize the cdf

    if y < CDFlookup_table[0]:  
        t = y / CDFlookup_table[0]
        return t / lookup_table_size
    index = -1
    for j in range(lookup_table_size):
        if CDFlookup_table[j] >= y:
            index = j
            break
    # linear interpolation
    t = (y - CDFlookup_table[index - 1]) / \
        (CDFlookup_table[index] - CDFlookup_table[index - 1])
    fractional_index = index + t  # 因为index从0开始,所以不是 (index-1)+t
    return fractional_index / lookup_table_size


def inverseCDF():
    """
    return the x value in PDF
    """
    uniform_random = random.random()
    return inverse_cdf_numerically(uniform_random)


def pdf2(x):
    return (x * x * x - 10.0 * x * x + 5.0 * x + 11.0) / (10.417)

def draw_pdf(D):
    global bin_count
    D = collections.OrderedDict(sorted(D.items()))
    plt.bar(range(len(D)), list(D.values()), align='center')
    value_list = [(key + 0.5) / bin_count for key in D.keys()]
    plt.xticks(range(len(D)), value_list)
    plt.xlabel('x', fontsize=5)
    plt.ylabel('counts', fontsize=5)
    plt.title('counting bits')
    plt.show()


for i in range(90000):
    x = inverseCDF()
    bin = math.floor(x * bin_count)  # type(bin): int
    count_dict[bin] = count_dict.get(bin, 0) + 1

draw_pdf(count_dict)

真实函数与模拟结果
在这里插入图片描述
扩展：生成伯努利、正太分布

import numpy as np
import matplotlib.pyplot as plt
"""
reference:
https://blog.demofox.org/2017/07/25/counting-bits-the-normal-distribution/
"""


def plot_bar_x():
    # this is for plotting purpose
    index = np.arange(counting.shape[0])
    plt.bar(index, counting)
    plt.xlabel('x', fontsize=5)
    plt.ylabel('counts', fontsize=5)
    plt.title('counting bits')
    plt.show()


# if dice_side=2, is binomial distribution
# if dice_side>2 , is multinomial distribution
dice_side = 2
# if N becomes larger, then multinomial distribution will more like normal distribution
N = 100

counting = np.zeros(((dice_side - 1) * N + 1))

for i in range(30000):
    sum = 0
    for j in range(N):
        dice_result = np.random.randint(0, dice_side)
        sum += dice_result

    counting[sum] += 1

# normalization
counting /= np.sum(counting)
plot_bar_x()

normol

关注

11
点赞
踩
70

收藏

觉得还不错? 一键收藏
3
评论
给定概率密度，生成随机数 python实现

实现的方法可以不止一种：rejection samplinginvert the cdfMetropolis Algorithm (MCMC)本篇介绍根据累积概率分布函数的逆函数(2：invert the CDF)生成的方法。原参考网站：https://blog.demofox.org/2017/08/05/generating-random-numbers-from-a-speci...
复制链接

扫一扫

专栏目录