Examples of Sequence Data
- Speech Recognition
- Music Generation
- Sentiment Classification
- DNA Sequence Analysis
- Machine Translation
- Video Activity Recognition
- Name Entity Recognition
Notation
Symbol | Meaning |
---|---|
X(i)<t> X ( i ) < t > | The t t th element in the input sequence for training example |
Y(i)<t> Y ( i ) < t > | The t t th element in the output sequence for training example |
T(i)X T X ( i ) | Input sequence length for training example i i |
Output sequence length for training example i i |
Recurrent Neural Network Model
Why not standard network?
- Inputs, outputs can be different lengths in different examples.
- Doesn’t share features across different features of text.
RNN Unit
y^<t>=g(Wyaa<t>+by)
y
^
<
t
>
=
g
(
W
y
a
a
<
t
>
+
b
y
)
Let
Wa=(WaaWax),[a<t−1>,x<t>]=(a<t−1>x<t>),Wy=Wya,
W
a
=
(
W
a
a
W
a
x
)
,
[
a
<
t
−
1
>
,
x
<
t
>
]
=
(
a
<
t
−
1
>
x
<
t
>
)
,
W
y
=
W
y
a
,
then
a<t>=g(Wa[a<t−1>,x<t>]+ba)
a
<
t
>
=
g
(
W
a
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
a
)
y^<t>=g(Wya<t>+by)
y
^
<
t
>
=
g
(
W
y
a
<
t
>
+
b
y
)
Forward Propagation
Different Types of RNNs
Type | Example |
---|---|
Many-to-many, Tx=Ty T x = T y | Name entity recognition |
Many-to-one | Sentiment classification |
One-to-one | |
One-to-many | Music generation |
Many-to-many, Tx≠Ty T x ≠ T y | Machine translation |
1. Many-to-many,
Tx=Ty
T
x
=
T
y
2. Many-to-one
3. One-to-one
4. One-to-many
5. Many-to-many,
Tx≠Ty
T
x
≠
T
y
Gated Recurrent Unit (GRU)
c~<t>=tanh(Wc[Γr∗c<t−1>,x<t>]+bc)
c
~
<
t
>
=
tanh
(
W
c
[
Γ
r
∗
c
<
t
−
1
>
,
x
<
t
>
]
+
b
c
)
Update Gate:
Γu=σ(Wu[c<t−1>,x<t>]+bu)
Γ
u
=
σ
(
W
u
[
c
<
t
−
1
>
,
x
<
t
>
]
+
b
u
)
Relevant Gate:
Γr=σ(Wr[c<t−1>,x<t>]+br)
Γ
r
=
σ
(
W
r
[
c
<
t
−
1
>
,
x
<
t
>
]
+
b
r
)
Memory cell value:
c<t>=Γu∗c~<t>+(1−Γu)∗c<t−1>
c
<
t
>
=
Γ
u
∗
c
~
<
t
>
+
(
1
−
Γ
u
)
∗
c
<
t
−
1
>
a<t>=c<t>
a
<
t
>
=
c
<
t
>
Long Short Term Memory (LSTM)
c~<t>=tanh(Wc[a<t−1>,x<t>]+bc)
c
~
<
t
>
=
tanh
(
W
c
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
c
)
Update Gate:
Γu=σ(Wu[a<t−1>,x<t>]+bu)
Γ
u
=
σ
(
W
u
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
u
)
Forget Gate:
Γf=σ(Wf[a<t−1>,x<t>]+bf)
Γ
f
=
σ
(
W
f
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
f
)
Output Gate:
Γo=σ(Wo[a<t−1>,x<t>]+bo)
Γ
o
=
σ
(
W
o
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
o
)
Memory Cell:
c<t>=Γu∗c~<t>+Γf∗c<t−1>
c
<
t
>
=
Γ
u
∗
c
~
<
t
>
+
Γ
f
∗
c
<
t
−
1
>
a<t>=Γo∗tanhc<t>
a
<
t
>
=
Γ
o
∗
tanh
c
<
t
>