X
∈
R
l
s
×
d
m
,
Y
∈
R
l
t
×
d
m
,
W
∈
R
d
m
×
d
h
X\in \mathbb{R}^{l_s\times d_m},\ Y\in \mathbb{R}^{l_t \times d_m},\ W\in \mathbb{R}^{d_m\times d_h}
X∈Rls×dm, Y∈Rlt×dm, W∈Rdm×dh
E
=
tanh
(
W
a
⋅
X
+
b
1
)
∈
R
l
s
×
d
h
E=\tanh(W_a\cdot X+b_1)\in\mathbb{R}^{ls\times d_h}
E=tanh(Wa⋅X+b1)∈Rls×dh
Q
=
W
b
⋅
Y
∈
R
l
t
×
d
h
Q=W_b \cdot Y\in \mathbb{R}^{l_t\times d_h}
Q=Wb⋅Y∈Rlt×dh
A
=
E
⋅
Q
T
∈
R
l
s
×
l
t
A=E\cdot Q^T\in \mathbb{R}^{l_s\times l_t}
A=E⋅QT∈Rls×lt
Y
′
=
L
S
T
M
(
A
)
∈
R
l
t
×
d
h
Y'=LSTM(A)\in \mathbb{R}^{l_t\times d_h}
Y′=LSTM(A)∈Rlt×dh
L
=
−
Y
log
Y
′
\mathcal{L}=-Y\log Y'
L=−YlogY′
Attention笔记
最新推荐文章于 2023-01-17 14:58:53 发布