# Examples of Sequence Data

• Speech Recognition
• Music Generation
• Sentiment Classification
• DNA Sequence Analysis
• Machine Translation
• Video Activity Recognition
• Name Entity Recognition

# Notation

Symbol Meaning
${X}^{\left(i\right)}$$X ^{(i) }$ The $t$$t$ th element in the input sequence for training example $i$$i$
${Y}^{\left(i\right)}$$Y ^{(i) }$ The $t$$t$ th element in the output sequence for training example $i$$i$
${T}_{X}^{\left(i\right)}$$T ^{(i)} _{X}$ Input sequence length for training example $i$$i$
${T}_{y}^{\left(i\right)}$$T ^{(i)} _{y}$ Output sequence length for training example $i$$i$

# Recurrent Neural Network Model

## Why not standard network?

1. Inputs, outputs can be different lengths in different examples.
2. Doesn’t share features across different features of text.

## RNN Unit

${a}^{}=g\left({W}_{aa}{a}^{}+{W}_{ax}{x}^{}+{b}_{a}\right)$$a ^{} = g \left ( W _{a a} a ^{} + W _{a x} x ^{} + b _{a} \right )$
${\stackrel{^}{y}}^{}=g\left({W}_{ya}{a}^{}+{b}_{y}\right)$$\hat y ^{} = g \left (W _{y a} a ^{} + b _{y} \right )$
Let ${W}_{a}=\left(\begin{array}{c}{W}_{aa}{W}_{ax}\end{array}\right),\left[\begin{array}{c}{a}^{},{x}^{}\end{array}\right]=\left(\begin{array}{c}{a}^{}\\ {x}^{}\end{array}\right),{W}_{y}={W}_{ya},$$W _{a} = \begin{pmatrix} W _{a a} W _{a x} \end{pmatrix}, \begin{bmatrix} a ^{}, x ^{} \end{bmatrix} = \begin{pmatrix} a ^{} \\ x ^{} \end{pmatrix} , W _{y} = W _{y a},$ then
${a}^{}=g\left({W}_{a}\left[\begin{array}{c}{a}^{},{x}^{}\end{array}\right]+{b}_{a}\right)$$a ^{} = g \left ( W _{a} \begin{bmatrix} a ^{}, x ^{} \end{bmatrix} + b _{a} \right )$
${\stackrel{^}{y}}^{}=g\left({W}_{y}{a}^{}+{b}_{y}\right)$$\hat y ^{} = g \left (W _{y} a ^{} + b _{y} \right )$

# Different Types of RNNs

Type Example
Many-to-many, ${T}_{x}={T}_{y}$$T_{x} = T_{y}$ Name entity recognition
Many-to-one Sentiment classification
One-to-one
One-to-many Music generation
Many-to-many, ${T}_{x}\ne {T}_{y}$$T_{x} \neq T_{y}$ Machine translation

1. Many-to-many, ${T}_{x}={T}_{y}$$T_{x} = T_{y}$

2. Many-to-one

3. One-to-one

4. One-to-many

5. Many-to-many, ${T}_{x}\ne {T}_{y}$$T_{x} \neq T_{y}$

# Gated Recurrent Unit (GRU)

${\stackrel{~}{c}}^{}=\mathrm{tanh}\left({W}_{c}\left[{\mathrm{\Gamma }}_{r}\ast {c}^{},{x}^{}\right]+{b}_{c}\right)$$\tilde c ^{} = \tanh \left ( W _{c} \left [ \Gamma _{r} * c ^{}, x ^{} \right ] + b _{c} \right )$
Update Gate: ${\mathrm{\Gamma }}_{u}=\sigma \left({W}_{u}\left[{c}^{},{x}^{}\right]+{b}_{u}\right)$$\Gamma _{u} = \sigma \left ( W _{u} \left [ c ^{}, x ^{} \right ] + b _{u} \right )$
Relevant Gate: ${\mathrm{\Gamma }}_{r}=\sigma \left({W}_{r}\left[{c}^{},{x}^{}\right]+{b}_{r}\right)$$\Gamma _{r} = \sigma \left ( W _{r} \left [ c ^{}, x ^{} \right ] + b _{r} \right )$
Memory cell value: ${c}^{}={\mathrm{\Gamma }}_{u}\ast {\stackrel{~}{c}}^{}+\left(1-{\mathrm{\Gamma }}_{u}\right)\ast {c}^{}$$c ^{} = \Gamma _{u} * \tilde c ^{} + \left ( 1 - \Gamma _{u} \right ) * c ^{}$
${a}^{}={c}^{}$$a ^{} = c ^{}$

# Long Short Term Memory (LSTM)

${\stackrel{~}{c}}^{}=\mathrm{tanh}\left({W}_{c}\left[{a}^{},{x}^{}\right]+{b}_{c}\right)$$\tilde c ^{} = \tanh \left ( W _{c} \left [ a ^{}, x ^{} \right ] + b _{c} \right )$
Update Gate: ${\mathrm{\Gamma }}_{u}=\sigma \left({W}_{u}\left[{a}^{},{x}^{}\right]+{b}_{u}\right)$$\Gamma _{u} = \sigma \left ( W _{u} \left [ a ^{}, x ^{} \right ] + b _{u} \right )$
Forget Gate: ${\mathrm{\Gamma }}_{f}=\sigma \left({W}_{f}\left[{a}^{},{x}^{}\right]+{b}_{f}\right)$$\Gamma _{f} = \sigma \left ( W _{f} \left [ a ^{}, x ^{} \right ] + b _{f} \right )$
Output Gate: ${\mathrm{\Gamma }}_{o}=\sigma \left({W}_{o}\left[{a}^{},{x}^{}\right]+{b}_{o}\right)$$\Gamma _{o} = \sigma \left ( W _{o} \left [ a ^{}, x ^{} \right ] + b _{o} \right )$
Memory Cell: ${c}^{}={\mathrm{\Gamma }}_{u}\ast {\stackrel{~}{c}}^{}+{\mathrm{\Gamma }}_{f}\ast {c}^{}$$c ^{} = \Gamma _{u} * \tilde c ^{} + \Gamma _{f} * c ^{}$
${a}^{}={\mathrm{\Gamma }}_{o}\ast \mathrm{tanh}{c}^{}$$a ^{} = \Gamma _{o} * \tanh c ^{}$