CS224N NLP

附大佬的笔记:
github.com/LooperXX/LooperXX.github.io.git

文章目录

Abbreviation

- -
[ToL] To learn
[ToLM] To learn more
[ToLO] To learn optionally
(0501) 05 min 01s
(h0501) 1 hour 05 min 01s
(hh0501) 2 hour 05 min 01s

Lecture 1 - Introduction and Word Vectors

image-20220214151948950

NLP

Convert one-hot encoding to distributed representitions

Ont hot can’t represent the relation between word vectors,it is too big

Word2vec

Ignore the position of word of context

image-20220214135823259 image-20220214135951707 image-20220214140036077

Use two vector in one word: centor word context word.

softmax function

image-20220214140209594

image-20220214141232602

Train the model: gradient descent

image-20220214141455212

There is a term to calculate the gradient descent. (39:50-56:40)

result is :image-20220214143920015

ToL

Review derivation and the following especially.

image-20220214142712551

Show some achievement with code(5640-h0516)

  • We can do vector addition, subtraction, multiplication and division, etc.

QA

Why are there center word and context word(h0650)

To avoid one vector dot product himself in some situation???

Even synonyms can be merged into a vector(h1215)

Which is different from lee ,He says synonyms use different.

Lecture 2 Word Vectors,Word Senses,and Neural Classifiers

image-20220214152314870

image-20220214152611205

Bag models (0245)

The model makes the same predictions at each position.

Gradient descent (0600)

Not usually use because of the big calculation.

step size: not too big nor too small

image-20220214153736035

stochastic gradient descent SGD TOBELM (0920)

Take part of the corpus

billion faster.

Maybe even get better result.

But it is stochastic, either you need sparse matrix update operations to only update certain rows of full embedding matrices U and V, or you need to keep around a hash for vectors.(1344)ToL

more details of word2vec(1400)

image-20220214160400315

SG use center to predict context

SGNS negative sampling [ToBLO]

use logistic function instead of softmax and take sampling of corpus

CBOW opposite.

image-20220214162201460

Why use two vectors(1500)

Sometime it will dot product with itself.

image-20220214165957190

[ToL]

The first one is positive word and the last is negative word (2800)

negative word is being sampled cause the center word will turn up on other occasions, when it does, there will have other sampling, and it will learn step by step.

Why not capture co-occurrence counts directly?(2337)

image-20220214171624671

SVD(3230) [ToL]

https://zhuanlan.zhihu.com/p/29846048

use svd to get lower dimensional representations for words

image-20220214172338354(3451)

Count based vs direct prediction

image-20220214173136681(3900)

Encoing meaning components in vector differences(3948)

This is to make addition subtraction available for word vectors.

image-20220214173907221

GloVe (4313)

image-20220214174416350

let dot product minus log of the co-occurrence

How to evaluate word vectors Intrinsic vs. extrinsic(4756)

image-20220214175746085

Analogy evaluation and hyperparameters (intrinsic)(5515)

Word vector distances and their correlation with human judgements(5640)

Data shows that 300 dimensional word vector is good(5536)

The objective function for the GloVe model and What log-bilinear means(5739)

Word senses and word sense ambiguity(h0353)

One word different mean different vector.

then a word can be the sum of them all

image-20220214184234513

It will work good but not bad (h1200)

the vector is so sparse that you can separate out different senses (h1402)

Lecture 3 Gradients by hand(matric calculus) and algorithmically(the backpropagation algorithm) all the math details of doing nerual net learning

image-20220214191638029

Need to be learn again, it is not totally understanded.

Named Entity Recognition(0530)

image-20220214185926393

Simple NER (0636)

image-20220214190032048

How the sample model run (0836)

image-20220214190306082

update equation(1220)

image-20220214191531863

jacobian(1811)

image-20220214192319871

Chain Rule(2015)

image-20220214192526698

image-20220214193151609

do one example step (2650)

image-20220214193417520

hadamard product ToL

Reusing Computation(3402)

image-20220215112833279

ds/dw

image-20220215113433454 image-20220215113255573

Forward and backward propagation(5000)

image-20220215115109857 image-20220215115507912

An example(5507)

a = x+y

b = max(y,z)

f = ab

image-20220215120119537

Compute all gradients at once (h0005)

image-20220215145351805

Back-prop in general computation graph(h0800)[ToL]

image-20220215145612746

Automatic Differentiation(h1346)

Many tools can calculate automaticly.image-20220215151328471

Manual Gradient checking : Numeric Gradient(h1900)

image-20220215152039987

Lecture 4 Dependency Parsing

image-20220215152912089

Two views of linguistic structure

Constituency = phrase structure grammar = context-free grammars(CFGs)(0331)

Phrase structure organizes words into nested constituents

image-20220215155446438

Dependency structure(1449)

Dependency structure shows which words depend on (modify, attach to,or are arguments of)

image-20220215155924838

Why do we need sentence structure?(2205)

Can not express meaning by just one word.

image-20220215160252254

Prepositional phrase attachment ambiguity.(2422)

There is some sentence to show it:

San Jose cops kill man with knife

Scientists count whales from space

The board approved [its acquisition] [by Royal Trustco Ltd.] [of Toronto] [for $27 a share] [at its monthly meeting].

Coordination scope ambiguity(3614)

**Shuttle veteran and longtime NASA executive Fred Gregory appointed to board **

Doctor: No heart, cognitive issues

Adjectival/Adverbial Modifier Ambiguity(3755)

Students get [first hand job] experience Students get first [hand job] experience

Verb Phrase(VP) attachment ambiguity(4404)

Mutilated body washes up on Rio beach to be used for Olympics beach volleyball.

image-20220215163226892

Dependency Grammar and Dependency structure(4355)

image-20220215163439157

Will add a fake ROOT for handy

Dependency Grammar history(4742)

image-20220215163821573

The rise of annotated data Universal Dependency tree(5100)

image-20220215164213166

Tree bank(5400)

Its too slow to write a grammar by hand but its still worth,cause it can used in another place but not only nlp .

how to build parser with dependency(5738)

image-20220215165030760

Dependency Parsing

image-20220215165444250

Projectivity(h0416)

image-20220215165801145

Methods of Dependency Parsing(h0521)

image-20220215170003800

Greedy transition-based parsing(h0621)

Basic transition-based dependency parser (h0808)

image-20220215170303720

[root] I ate fish

[root I ate] fish

[root ate] fish

[root ate fish]

[root ate]

[root]

MaltParser(h1351)[ToL]

image-20220215171511327

Evaluation of Dependency Parsing (h1845)[ToL]

image-20220215172606079

Lecture-5 Languages models and Recurrent Neural Networks(RNNs)

image-20220215173841609

A neural dependency parser(0624)

image-20220215175916431

Distributed Representations(0945)

image-20220215180234046##

Deep Learning Classifier are non-linear classifiers(1210)

image-20220215180544369

Deep Learning Classifier’s non-linear classifiers:

image-20220215180703045

Simple feed-forward neural network multi-class classifier (1621)

image-20220215181359982

Neural Dependency Parser Model Architecture(1730)

image-20220215182714531

Graph-based dependency parsers (2044)

image-20220215182932684

Regularization && Overfitting (2529)

image-20220215183327050

Dropout (3100)[ToL]

image-20220215184016985

Vectorization(3333)

image-20220215184453079

Non-linearities (4000)

image-20220215185618924

Parameter Initialization (4357)

image-20220215185707615

Optimizers(4617)

image-20220215185920518

Learning Rates(4810)

It can be slow as the learning go on.

image-20220215190108626

Language Modeling (5036)

image-20220215190413343

n-gram Language Models(5356)

image-20220215190718037 image-20220215190841180

Sparsity Problems (5922)

Many situation didn’t occur so it will be zero

image-20220215191735246

Storage Problems(h0117)

How to build a neural language model(h0609)

image-20220215192255066

A fixed-window neural Language Model(h1100)

image-20220216103904942

Recurrent Neural Network (RNN)(h1250)

x1 -> y1

Wx1 x2 -> y1

image-20220216105731982

A Simple RNN Language Model(h1430)

image-20220216110248289

image-20220216110444328

Lecture 6 Simple and LSTM Recurrent Neural Networks.

image-20220216110620895

image-20220216111222942

The Simple RNN Language Model (0310)

image-20220216112005817

Training an RNN Language Model (0818)

RNN takes more time.

Teacher Forcing

penalize when dont take its advise

image-20220216112357329

image-20220216112814935

image-20220216113456552

But how do we get the answer?

image-20220216113810612

image-20220216114843011

Evaluating Language Models (2447)[ToL]

image-20220216115442761

Language Model is a system that predicts the next word(3130)

image-20220216120043119

Other use of RNN(3229)

Tag for word

image-20220216120154220

Used for classification(3420)

image-20220216120331039

Used to Language encoder module (3500)

image-20220216120515954

Used to generate text (3600)

image-20220216120602654

Problems with Vanishing and Exploding Gradients(3750)[IMPORTANT]

image-20220216120728010

[ToL]

image-20220216120836593

Why This is a problem (4400)

image-20220216121352667

image-20220216121537213

image-20220216121801767

We can give him a limit.

image-20220216121845504

Long Short Term Memory RNNS(LSTMS)(5000)[ToL]

image-20220216142509947

image-20220216143131901

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值