python 两个数据框合并计算_概率计算：定义概率分布数据结构，Python实现概率分布计算-CSDN博客

使用Python实现马尔科夫随机场、蒙特卡洛采样等随机过程算法的前提，就是用Python实现概率的计算。并不只是数值计算，而是能够将随机模拟中常用的各种概率相关的操作，都能用计算机的数据结构来表达，其关键在于对【随机变量】的适当定义处理。

因此本文介绍一下概率分布在Python中定义的一种数据结构。

一个概率分布的组成要素包含：随机变量、变量的维度、变量不同取值状态的对应概率值。

在一个有向图中(贝叶斯网络)，常有如下所示的结构：

典型的贝叶斯网络

在这个贝叶斯网络中，有五个随机变量，各自有不同的维度，并且在列表中给出了对应变量取值的概率值。因此我们的目标可以拆解为两步，第一：

用Python精准地定义一个因子对象(记为Factor)，这个对象表达一个联合分布，其中包含随机变量、变量维度、对应概率值这三个信息。

第二：

实现概率相乘、求边缘概率、求条件概率运算。

这次先解决问题一。

将概率分布间的计算用python表达出来，包括对一个联合分布求某个变量的边缘概率分布，对一个联合分布给定一部分变量的取值，求条件概率分布。以及更基础的，求两个因子的乘积-状态维度和状态维度的相乘，所对应的概率值的数量也同样扩大。

最后，分别获取这个贝叶斯网络中每个因子的概率分布，我们要求出整个网络的联合分布：

求联合分布

联合分布概率为每个因子在父节点条件下条件概率的乘积。

具体实现如下：

首先实现因子对象，Factor，其中随机变量用属性var(variables)表示，为python_list结构，变量维度card(cardinality)，同样为list，概率取值val(value)列表list。对因子对象实现获取状态结构函数(get_all_assignments),正态化函数(将val列表中所有概率值正态化，使其和为1)。

注意，val概率值列表的顺序，对应着变量取值的某个特定状态。

实现打印功能，方便查看一个因子的具体情况。

class Factor:    def __init__(self, var=None, card=None, val=None):        if var is None:            var = np.array([], np.int64)        if card is None:            card = np.array([], np.int64)        if val is None:            val = np.array([], np.float64)        self.var = np.array(var)        self.card = np.array(card)        self.val = np.array(val)    def is_empty(self):        """Returns true if the factor is empty (i.e. not initialized)"""        return len(self.var) == 0    def get_all_assignments(self):        assignments = index_to_assignment(np.arange(int(np.prod(self.card))), self.card)        return assignments    def __repr__(self):        if self.is_empty():            str = 'Empty factor'        else:            str = 'Factor containing {} variables'.format(len(self.var))            num_states = int(np.prod(self.card))            assigments = index_to_assignment(np.arange(num_states), self.card)            # Print header row            header = '| ' + ' '.join(['X_{}'.format(i) for i in self.var]) +                      ' | Probability |'            col_width = len(header)            line = '-' * col_width + ''            str += line + header + '' + line            # Assignments            for i in range(assigments.shape[0]):                lhs = '   '.join(['{}'.format(a) for a in assigments[i]])                row = '|  ' + lhs + '  | ' + '{:>11g}'.format(self.val[i]) + ' |'                str = str + row            str += line + ''        return str    def normalize(self):        """Normalize the probablity such that it sums to 1.        Use this function with care since not all factor tables should be        normalized.        """        if not self.is_empty():            self.val /= np.sum(self.val)

比如，定义一个因子:

card=[2,3,2]phi = Factor(var=[1, 2,5],                 card=card,                 val=[0.6, 0.3, 0.2, 0.7, 0.2, 0.0,0.1,0.2,0.1,0.1,0.2,0.3])

使得该因子含3个随机变量，随机变量维度分别为2，3，2，概率值如val列表所示。

将其打印，结果如下：

print(phi)———————————————Factor containing 2 variables-------------------------| X_1 X_2 X_5 | Probability |-----------------------------|  0   0   0  |         0.6 ||  1   0   0  |         0.3 ||  0   1   0  |         0.2 ||  1   1   0  |         0.7 ||  0   2   0  |         0.2 ||  1   2   0  |           0 ||  0   0   1  |         0.1 ||  1   0   1  |         0.2 ||  0   1   1  |         0.1 ||  1   1   1  |         0.1 ||  0   2   1  |         0.2 ||  1   2   1  |         0.3 |-----------------------------

这样就完成了问题一，实现了概率分布这个数据结构。这样做的意义，在于将随机变量的信息和概率值绑定在一起，为后续的概率计算，乃至各种随机模拟算法的实现提供条件。

为了后续的实现，开发两个因子函数，用以方便地查看概率值，及其所对应的变量状态取值：

def index_to_assignment(index: Union[int, np.ndarray], card: np.ndarray):    """Convert index to variable assignment    Args:        index: Index to convert into assignment.          If index is a vector of numbers, the function will return          a matrix of assignments, one assignment per row.        card: Cardinality of the factor    """    if isinstance(index, int):        is_scalar = True        index = np.array([index])    else:        is_scalar = False    divisor = np.cumprod(np.concatenate([[1.], card[:-1]]))    assignment = np.mod(        np.floor(index[:, None] / divisor[None, :]),        card[None, :]    ).astype(np.int64)    if is_scalar:        assignment = assignment[0]      return assignmentdef assignment_to_index(assignment: np.ndarray, card: np.ndarray):    """Convert assignment to index.    Args:        assignment: Assignment to convert to index        card: Cardinality of the factor    """    multiplier = np.cumprod(np.concatenate([[1.], card[:-1]]))    index = np.sum(assignment * multiplier, axis=-1).astype(np.int64)    return index

以上是用Python定义概率分布数据结构的方法。