# CS224d-Lecture8

### Language Model

##### probability of a sequence of words
• P(w1, w2, …, wT)
##### Useful for machine learning:
###### word - ordering
• p(the cat is small) > p(small the is cat)
###### word - choice
• p(walking home after school) > p(walking house after school)

##### assumption

P(w1,w2,...,wT)=i=1mP(wi|w1,wi1)i=1mP(wi|w1,wi1)

##### n-gram
• unigram p(w2|w1)=count(w1,w2)count(w1)$p(w_2|w_1) = \frac{count(w_1,w_2)}{count(w_1)}$
• bigram p(w3|w1,w2)=count(w1,w2,w3)count(w1,w2)$p(w_3|w_1, w_2) = \frac{count(w_1,w_2,w_3)}{count(w_1, w_2)}$
n-gram 耗费大量内存

### RNN

• 每步权重互联
• 条件依赖于之前所有单词
• RAM 耗费只同单词量相关

ht=σ(Whhht1+Whxxt)$h_t = \sigma(W^{hh} h_{t-1} + W^{hx}x_t)$
y^t=softmax(Wsht)$\hat y_t = softmax(W^{s} h_t)$

##### vanishing / exploding gradient problem

total error

EW=t=1TEtW

EtW=k=1TEtytythththkhkW

hthk=j=k+1thjhj1

ht=Wf(ht1)+W(hx)x[t]

hthk=j=k+1thjhj1=j=k+1tWTdiag(f(hj1))

||hjhj1||<=||WT||||diag(f(hj1)||<=βWβh

||hthk||=||j=k+1thjhj1||<=(βWβh)tk

##### softmax is huge and slow
• class based trick
##### 双向 RNN
• 之前和之后的训练词对当前训练都有影响
##### F1 度量

precision = tp/(tp+fp)
recall = tp/(tp+fn)
F1 = 2(precision recall)/(precsion + recall)

04-26 74

12-21 4563

02-25 1115

09-19 382

11-02 1291

07-17 32

07-26 30

12-26 257

07-12 27

#### CS 224D Lecture 3札记 分类： CS224D notes ...

©️2020 CSDN 皮肤主题: 大白 设计师: CSDN官方博客

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。