AI/机器学习常用公式的LaTex代码汇总

最新推荐文章于 2025-03-18 20:10:30 发布

blmoistawinde

最新推荐文章于 2025-03-18 20:10:30 发布

阅读量3.8k

点赞数 8

分类专栏：机器学习文章标签：机器学习自然语言处理深度学习 latex

本文链接：https://blog.csdn.net/blmoistawinde/article/details/106258983

版权

本文整理了常见的AI和机器学习公式，包括RNNs、注意力机制、Transformer、GAN、VAE等模型的LaTeX表示，以及激活函数、损失函数、评估指标的相关公式，适用于论文和博客写作。资料已同步到GitHub，欢迎大家补充和改进。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在写AI/机器学习相关的论文或者博客的时候经常需要用到LaTex的公式，然而作为资深“伸手党”的我在网上搜索的时候，居然没有找到相关现成资源@-@

那么，我就把自己经常会遇到的公式整理如下，以NLP和一些通用指标函数为主。有需要的可以自取，当然发现有问题或者遗漏的也欢迎指正和补充。（我同步到了Github上( https://github.com/blmoistawinde/ml_equations_latex )，欢迎提issue和PR，当然还有star~）

Classical ML Equations in LaTeX

A collection of classical ML equations in Latex . Some of them are provided with simple notes and paper link. Hopes to help writings such as papers and blogs.

Better viewed at https://blmoistawinde.github.io/ml_equations_latex/

Classical ML Equations in LaTeX

Model

RNNs(LSTM, GRU)

encoder hidden state $h_t$ at time step $t$
$h_t = RNN_{enc}(x_t, h_{t-1})$

decoder hidden state $s_t$ at time step $t$

$s_t = RNN_{dec}(y_t, s_{t-1})$

h_t = RNN_{enc}(x_t, h_{t-1})
s_t = RNN_{dec}(y_t, s_{t-1})

The $RNN_{enc}$ , $RNN_{dec}$ are usually either

LSTM (paper: Long short-term memory)
GRU (paper: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation).

Attentional Seq2seq

The attention weight $\alpha_{ij}$ , the $i$ th decoder step over the $j$ th encoder step, resulting in context vector $c_i$

$c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j$

$\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})}$

$e_{ik} = a(s_{i-1}, h_j)$

c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j

\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})}

e_{ik} = a(s_{i-1}, h_j)

$a$ is an specific attention function, which can be

Bahdanau Attention

Paper: Neural Machine Translation by Jointly Learning to Align and Translate

$e_{ik} = v^T tanh(W[s_{i-1}; h_j])$

e_{ik} = v^T tanh(W[s_{i-1}; h_j])

Luong(Dot-Product) Attention

Paper: Effective Approaches to Attention-based Neural Machine Translation

If $s_i$ and $h_j$ has same number of dimension.

$e_{ik} = s_{i-1}^T h_j$

otherwise

$e_{ik} = s_{i-1}^T W h_j$

e_{ik} = s_{i-1}^T h_j

e_{ik} = s_{i-1}^T W h_j

Finally, the output

最低0.47元/天解锁文章