CS244n NLP with Deep Learning | Winter 2019

最新推荐文章于 2024-08-14 16:19:45 发布

chikeshi

最新推荐文章于 2024-08-14 16:19:45 发布

阅读量331

点赞数

文章标签：自然语言处理机器学习神经网络 pytorch tensorflow

本文链接：https://blog.csdn.net/chikeshi/article/details/114244347

版权

这是一份关于CS244n深度学习自然语言处理课程的详细笔记，涵盖了从word2vec到Transformer的各种主题，包括词向量、RNN、注意力机制、NMT、BERT等。笔记讨论了模型的优缺点，如n-gram、SMT与NMT的比较，并介绍了不同的训练方法，如最大似然训练、负采样和注意力机制。此外，还涉及到了词性标注、依存解析、实体链接和消除偏见在AI中的重要性。

摘要由CSDN通过智能技术生成

CS244n NLP with Deep Learning | Winter 2019

Lecture 1
- word2vec
Lecture 2
Lecture 3
- Lecture 4
Lecture 5
- Lecture 6 RNN
- Lecture 7
- Lectrue 8
- Attention
Lecture 9
Leture 10
Lecture 11
Lecture 12
- A tiny bit of linguistics
- purely character-level models
- character-level NMT models
- subword-models: byte pair encoding and friends(BPE)
- character-basedLSTM
- hybird character and word level models
- fastText
lecture 13
- tag LM
- ELMo
- problem with word embedding
- Transformer
- multihead attention
lecture 14
lecture15 natural language generation
- Maximum Likelihood Training(teacher)
- Greedy decoding
- Solution to repetition of greedy
- Top-k sampling
- top-p sampling
- Softmax temperature
- Unlikelihood training
Lecture 16
Coreference solution
- Rule based( Hobbs Algorithm)
- Mention Pair
- Mention Ranking
- Span NN
- Bert based is the best
Entity embedding
- Entity linking
- ERNIE：Enhanced language Representation with Informative Entities
Lecture 18
- Version 1
Lecture 19
Bias in Data leads to Bias AI
Lecture NLI( natrual language inference)
- check bias using test
- check how long LSTM remember
- check which input important
- even char RNN bad at intrepreting typos
- Bert analysis
HW2
- HW2 coding assignment
- HW3 coding assignment
HW4
HW5

Lecture 1

word2vec

在这里插入图片描述

每一个单词可以变成一个vector, 相似的word vector也相似
word = $\begin{bmatrix} 0.23 \\ 0.52 \\ -0.41 \\ -0.31 \\ 0.27 \\ 0.48 \end{bmatrix}$
$J(\theta) = \frac{1}{T}\sum^T_{t=1} \sum_{\begin{matrix}-m \le j \le m \\ j \neq 0 \end{matrix}} \log p(w_{t+j} | w_{t; \theta})$
这里m是单词j的sliding window的左右长度

$\theta = \begin{bmatrix} v_{apple} \\ v_{banana} \\ v_{zebra} \\ ... \\ v_{apple} \\ v_{banana} \\ v_{zebra} \end{bmatrix} = \mathbb{R}^{2dv}$
重复因为每个词有center和context两种representation。
derivative: $\frac{\partial \log(p)(\frac{exp(u_o^T \dot V_c) }{ \sum exp(u_o^T \dot V_c)})} {\partial v_c} = u_0 -$
$\frac{\partial }{\partial v_c}log\sum_{w=1}^V \exp(u_w^T \dot v_c) = \frac{1} {\sum_{w=1}^v \exp(u^T_w v_c)} * \sum_{x=1}^V\exp(u_x^Tv_c) \frac{\partial}{\partial v_c}u_x^Tv_c = \\ u_0 - \frac{ \sum_{x=1}^V \exp(u_x^Tv_c) u_x} {\sum_{w=1}^v \exp(u^T_w v_c)} = u_0 - \sum_{x=1}^Vp(x|c) u_x$
https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1
疑问：是不是WI指的是 $u_w$ , WO指的是 $v_w$ ?

Lecture 2

Gradient descent
$\theta^{new} = \theta^{old}- \alpha \nabla_\theta J(\theta)$ , $\alpha$ is the learnnig rate
Stochastic Gradient Descnet and negative sampling
$=\frac{exp(u_o^T \dot V_c) }{ \sum_{w \in V} exp(u_w^T \dot V_c)}$

最低0.47元/天解锁文章

chikeshi

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS244n NLP with Deep Learning | Winter 2019

CS244n NLP with Deep Learning | Winter 2019 Lecture 1word2vecLecture 2Lecture 1word2vec每一个单词可以变成一个vector, 相似的word vector也相似word = [0.230.52−0.41−0.310.270.48]\begin{bmatrix} 0.23 \\ 0.52 \\ -0.41 \\ -0.31 \\ 0.27 \\ 0.48 \end{bmatrix}⎣⎢⎢⎢⎢⎢⎢⎡0.230.5
复制链接

扫一扫