LinearPartition

最新推荐文章于 2024-08-26 21:45:50 发布

吊儿郎当的凡

最新推荐文章于 2024-08-26 21:45:50 发布

阅读量651

点赞数 1

分类专栏： RNA结构预测文章标签：机器学习人工智能生物信息学

本文链接：https://blog.csdn.net/weixin_43269419/article/details/121474125

版权

RNA结构预测专栏收录该内容

13 篇文章 4 订阅

订阅专栏

LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities

Year: 2020
Authors: He Zhang, Liang Zhang, David H. Mathews and Liang Huang
Journal Name: Bioinformatics

Motivation

传统分割方法的复杂度与序列长度呈三次方关系。

Research Objective

设计一种线性时间的算法来近似分割函数和碱基对概率

Dataset

定义 $\textbf{x} = x_1, ..., x_n$ 为长度为 n 的 RNA 序列， $\mathcal{Y}(\textbf{x})$ 代表 $\bf{x}$ 所有可能的二级结构。

Method

分割函数为
$Q(\textbf{x}) = \sum_{\textbf{y} ∈ \mathcal{Y}(\textbf{x})} e^{-\frac{ΔG^{\circ}(y)}{RT}}$
其中， $ΔG^{\circ}(y)$ 为结构 $y$ 的 Gibb 自由能变化， $R$ 是气体常量， $T$ 是热动力学温度。
$ΔG^{\circ}(y)$ 使用逆 Nussinov–Jacobson 能量模型计算。在位置 $j$ 的未配对碱基的自由能变化表示为 $δ(\textbf{x}, j)$ ，碱基对 $(i, j)$ 的自由能变化表示为 $\xi(\textbf{x}, i, j)$ 。
因此， $ΔG^{\circ}(y)$ 表示为
$ΔG^{\circ}(y) = \sum_{j ∈ unpaired(y)} δ(\textbf{x}, j) + \sum_{(i, j) ∈ pairs(y)} \xi(\textbf{x}, i, j)$
其中， $u n p a i r e d (y)$ 表示 $y$ 中未配对的碱基集合， $p a i r s (y)$ 表示 $y$ 中的碱基对集合。
分割函数可表示为
$Q(\textbf{x}) = \sum_{\textbf{y} ∈ \mathcal{Y}(\textbf{x})} (\prod_{j ∈ unpaired(y)} e^{-\frac{δ(\textbf{x}, j)}{RT}} \prod_{(i, j) ∈ pairs(y)} e^{-\frac{\xi(\textbf{x}, i, j)}{RT}})$
作者定义 $[i, j]$ 为子序列 $x_i, ..., x_j$ （ $[j, j - 1]$ 为空）。
$[i, j]$ 对应的分割函数 $Q_{i, j} = \sum_{y ∈ \mathcal{Y}(x_i, ..., x_j)} e^{-\frac{ΔG^{\circ} (y)}{RT}}$
线性切割算法的伪代码如下所示

function LinearPartition( $\textbf{x}$ , b)
n = len( $\textbf{x}$ )
令所有的 $Q_{j, j-1} = 1$
for j in (1, …, n) do
for each $[i, j - 1]$ in $Q$ do
$Q_{i, j} += Q_{i, j-1} · e^{-\frac{δ(\textbf{x}, j)}{RT}}$
if $x_{i-1}x_j$ in {AU, UA, CG, GC, GU, UG} then
for each $[k, i - 2]$ in $Q$ do
$Q_{k, j} += Q_{k, i-2} · Q_{i, j-1} · e^{-\frac{\xi(\textbf{x}, i-1, j)}{RT}}$
BeamPrune( $Q$ , j, b)
return $Q$

在第 5-6 行，通过添加碱基 $x_j$ 将 $[i, j - 1]$ 扩展为 $[i, j]$ 。在 8-9 行，如果 $x_{i-1}$ 和 $x_j$ 为碱基对，就结合 $[i, j - 1]$ 和左侧的 $[k, i - 2]$ 并更新 $[k, j]$ ，如下图所示。
在这里插入图片描述

1-9 行算法的复杂度为 $O(n^3)$ 。于是，作者在每次循环 $j$ 时加入了 BeamPrune( $Q$ , j, b) ，伪代码如下所示

function BeamPrune( $Q$ , j, b)
for [i, j] in Q do
candidates[i] = $Q_{1, i-1} · Q_{i, j}$
candidates = SelectTopB(candidates, b) # 选出前 b 个最大的值
for $[i, j]$ in $Q$ do
if $i$ not in candidates then
delete $Q_{i, j}$

这样就使得复杂度降低为 $O(nb^2)$ ， $b$ 通常取 100 。
核苷酸 $i$ 与核苷酸 $j$ 配对的概率为
$p_{i, j} = \sum_{y ∈ \mathcal{Y}(\textbf{x}), (i, j) ∈ pairs(y)} p(y) \\ p(y) = \frac{e^{-\frac{ΔG^{\circ}(y)}{RT}}}{Q(\textbf{x})}$

Limitations

本方法无法保证近似质量，因为 b 为固定值。但如果以阈值作为筛选条件的话又不能保证运行时间。
LinearPartition 在随机序列，特别是存在很多冲突的配对选择时表现不好。

吊儿郎当的凡

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
LinearPartition

LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilitiesYear: 2020Authors: He Zhang, Liang Zhang, David H. Mathews and Liang HuangJournal Name: BioinformaticsMotivation传统分割方法的复杂度与序列长度呈三次方关系。Research
复制链接

扫一扫

专栏目录