自然语言处理NLP Lecture Note for CS181 ShanghaiTech

说明:笔记旨在整理我校CS181课程的基本概念。由于授课及考试语言为英文,故英文出没可能

目录

1 Formal grammers

2 regular grammars

3 dependency grammar

4 parsing

5 Learning grammars

5.1 Supervised methods

5.2 Unsupervised methods:

Reference

Appendix


1 Formal grammers

0.syntax: knowledge of the structural relationships between words

1.constituents: groups of words within sentences can be shown to act as single units.

2.grammer: the set of constituents and the rules that govern how they combine

3.CFGs(context-free grammers): aka: Phrase structure grammars

4. four components of a context-free grammar: a set of terminals \Sigma\a set N of nonterminals(phrases)\a start symbol S\in N\A set R of production rules(specifies how a nonterminal can produce a string of terminals and/or nonterminals)

5. grammatical structure(parse tree) of the string: ①starting from a string containing only the start symbol S② recursively applying rules to rewrite string ③ until the string contains only terminals

6.parsing: parsing is the process of taking a string and a grammar and returning one or more parse tree(s) for that string

7.probabilistic grammers: each rule is associated with a probaility. The probability of a parse tree is the product of the prob. of all the rules in that tree.

8. a sentence is ambiguous if it has more than one possible parse tree.

2 regular grammars

1. What is regular grammers? Only contain rules of form A->aB(word+parse) or A->a(word). Now, Probabilistic RG(regular grammar) = HMM(hidden markov models)

3 dependency grammar

1. A dependency grammar focuses on just binary relations among the words in a sentece while CFG focuses on consitituents.

2. A dependency parse is a tree where: ① the nodes are the words in a sentence. ② the links between the words represent their dependency relations.

3. advantage: deal well with free word order languages where the constituent structure is quite fulid. parsing is much faster than CFG-bases parsers. Dependency structure often captures the syntactic relations need by later applications(CFG capture it from tree anyway)

4. two modern approach to dependency parsing: ① optimization-based approaches. ② Shift-reduce approaches

5. DG vs. CFG

4 parsing

1.parsing with CFGs is the task of assigning proper parse trees to input strings

2. brute-force approach: enumerate all possible trees(high time cost)

3. DP(dynamic programming): a better way. CYK(Cock-Younger-Kasami Algorithm) is a bottom-up dynamic programming alogirhtm to solve sentence into CNF(Chomsky Normal Form).

4. CNF: only two types: A->B,C(nonterminal), A->w(terminal)

5. conversion to CNF: A->B, B->C turns to A->C/S->ABC turns to ->S->XC, X->AB

6. CYK algorithm

after filling table, recursively retrieve the constituents from the top(starting symbol) down.

7. to break tie(ambiguity when retrieve the constituents), we introduce probabilistic grammar, e.g: PCFG. TO find the parse tree with highest probability.

8.HMMs are special case of PCFGs, CYK on HMMs=Viterbi

5 Learning grammars

5.1 Supervised methods


1.treebank: a corpus(全集) in which each sentence has been (manually) paired with a parse tree.

2.key idea: max P(sentence, parse) by MLE

3.result: bad performance, may due to standard treebank nonterminals are not suffiicently informative

4. more notations

latent variable grammers: each nonterminal is split into a finite number of subtypes/ each subtype rule is with probability

discriminative parsing: we assume a weighted context-free grammar and maximize conditional likelihood P(parse | sentence).

5.2 Unsupervised methods:

two tasks: structure seach/ parameter learning

key idea: maximum (marginal) likelihood estimation. maximize P(sentence) with the parse tree marginalized with expecti-maximization.

 

Reference


1. Shanghaitech CS181 NLP.

Appendix

example of terminals, nonterminals and rules

1.nonterminals->nonterminals

2.nonterminals->terminals

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值