ElMo
- Published in 2018 and named as Embedding from language Models
- Deep contextualized word representations that models complex characteristics of word use and how these uses vary across linguistic contexts.
- It enables models to better disambiguate between *** sense of a given word.
- Elmo dynamically determines word embedding in downstream task.
- Elmo generates three embeddings.(1) word embedding. (2) 1st LSTM layer embedding (3) 2st LSTM layer embedding.
- Pre-training -> get three embedding(v1, v2, v3) per word.(Big data environment)
- Fine-tunning -> freeze embeddings and train weights(w1, w2, w3) for (v1, v2, v3) (local environment)
- The final embedding is w1v1 + w2*v2 + w3*v3
Two layer bidirectional LSTM backbone
two-layer - to learn different uses.
Bidirectional - to learn from context(context before and context-after)