在写AI/机器学习相关的论文或者博客的时候经常需要用到LaTex的公式,然而作为资深“伸手党”的我在网上搜索的时候,居然没有找到相关现成资源@-@
那么,我就把自己经常会遇到的公式整理如下,以NLP和一些通用指标函数为主。有需要的可以自取,当然发现有问题或者遗漏的也欢迎指正和补充。(我同步到了Github上( https://github.com/blmoistawinde/ml_equations_latex ),欢迎提issue和PR,当然还有star~)
Classical ML Equations in LaTeX
A collection of classical ML equations in Latex . Some of them are provided with simple notes and paper link. Hopes to help writings such as papers and blogs.
Better viewed at https://blmoistawinde.github.io/ml_equations_latex/
Model
RNNs(LSTM, GRU)
encoder hidden state h t h_t ht at time step t t t
h t = R N N e n c ( x t , h t − 1 ) h_t = RNN_{enc}(x_t, h_{t-1}) ht=RNNenc(xt,ht−1)
decoder hidden state s t s_t st at time step t t t
s t = R N N d e c ( y t , s t − 1 ) s_t = RNN_{dec}(y_t, s_{t-1}) st=RNNdec(yt,st−1)
h_t = RNN_{enc}(x_t, h_{t-1})
s_t = RNN_{dec}(y_t, s_{t-1})
The R N N e n c RNN_{enc} RNNenc, R N N d e c RNN_{dec} RNNdec are usually either
-
LSTM (paper: Long short-term memory)
-
GRU (paper: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation).
Attentional Seq2seq
The attention weight α i j \alpha_{ij} αij, the i i ith decoder step over the j j jth encoder step, resulting in context vector c i c_i ci
c i = ∑ j = 1 T x α i j h j c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j ci=j=1∑Txαijhj
α i j = exp ( e i j ) ∑ k = 1 T x exp ( e i k ) \alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})} αij=∑k=1Txexp(eik)exp(eij)
e i k = a ( s i − 1 , h j ) e_{ik} = a(s_{i-1}, h_j) eik=a(si−1,hj)
c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j
\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})}
e_{ik} = a(s_{i-1}, h_j)
a a a is an specific attention function, which can be
Bahdanau Attention
Paper: Neural Machine Translation by Jointly Learning to Align and Translate
e i k = v T t a n h ( W [ s i − 1 ; h j ] ) e_{ik} = v^T tanh(W[s_{i-1}; h_j]) eik=vTtanh(W[si−1;hj])
e_{ik} = v^T tanh(W[s_{i-1}; h_j])
Luong(Dot-Product) Attention
Paper: Effective Approaches to Attention-based Neural Machine Translation
If s i s_i si and h j h_j hj has same number of dimension.
e i k = s i − 1 T h j e_{ik} = s_{i-1}^T h_j eik=si−1Thj
otherwise
e i k = s i − 1 T W h j e_{ik} = s_{i-1}^T W h_j eik=si−1TWhj
e_{ik} = s_{i-1}^T h_j
e_{ik} = s_{i-1}^T W h_j
Finally, the output o