Context-Free Grammar 上下文无关语法
Basics of Context-Free Grammars 上下文无关语法基础
-
Symbols: 符号
- Terminal: word such as
book
终结符:如单词book - Non-terminal: syntactic label such as
NP
andVP
非终结符:如句法标签NP和VP - convention: 规约
- lowercase for terminals 使用小写表示终结符
- uppercase for non-terminals 使用大写表示非终结符
- Terminal: word such as
-
Productions: 产生式:
W -> X Y Z
- Exactly one non-terminal on the LHS 左侧恰好有一个非终结符
- An ordered list of symbols on the RHS, can be terminals or non-terminals 右侧为符号的有序列表,可以是终结符或非终结符
-
Start symbol:
S
开始符号:S -
Context-Free: 上下文无关:
- Production rules depends only on the LHS, not on ancestors, neighbors 产生规则仅依赖于左侧,不依赖于祖先节点,邻居节点
- Analogous to Markov chain 类似于马尔可夫链
- Behaviour at each steps depends only on the current state 每一步的行为仅依赖于当前状态
- Context-Free languages more general than regular languages. Allows recursive nesting 上下文无关语言比正规语言更通用。允许递归嵌套
- Production rules depends only on the LHS, not on ancestors, neighbors 产生规则仅依赖于左侧,不依赖于祖先节点,邻居节点
CFG Parsing CFG解析
-
Given production rules: E.g. 给定产生规则:例如
S -> a S b
S -> a b
-
And a string:
aaabbb
以及一个字符串:aaabbb -
Produce a valid parse tree: 生成有效的解析树:
-
If English can be represented with CFG: 如果英语可以用CFG表示:
- First develop the production rules 首先开发产生规则
- Can then build a parser to automatically judge whether a sentence is grammatical 然后可以构建一个解析器来自动判断一个句子是否符合语法
-
CFG strike a good balance: CFG取得了良好的平衡:
- CFG covers most syntactic patterns CFG覆盖了大多数语法模式
- CFG parsing in computational efficient CFG解析在计算上是高效的
Constituents 成分
Syntactic Constituents 句法成分
-
Sentences are broken into constituents 句子被分解成各个成分
- Word sequence that function as a coherent unit for linguistic analysis 作为语言分析的连贯单元的词序列
- Helps build CFG production rules 帮助构建CFG产生规则
-
Constituents have certain key properties: 成分有一些关键属性:
- movement: Constituents can be moved around sentences: 移位:成分可以在句子中移动位置:
Abigail gave [her brother] [a fish]
Abigail gave [a fish] to [her brother]
- Contrast:
[gave her]
and[brother a]
- substitution: Constituents can be substituted by other phrases of the same type: 替换:成分可以被同类型的其他短语替换:
Max thanked [his older sister]
Max thanked [her]
- Contrast:
[Max thanked]
and[thanked his]
- coordination: Constituents can be conjoined with other coordinators like
and
andor
" 连词:成分可以与and和or等连词结合:[Abigail] and [her young brother] brought a fish
Abigail [bought a fish] and [gave it to Max]
Abigail [bought] and [greedily ate] a fish
- movement: Constituents can be moved around sentences: 移位:成分可以在句子中移动位置:
Constituents and Phrases 成分和短语
-
Once identify constituents, use phrases to describe them 一旦确定了成分,使用短语来描述它们
-
Phrases are determined by their head word: 短语由其头词确定:
- Noun phrase 名词短语:
her younger brother
- Verb phrase 动词短语:
greedily ate it
- Noun phrase 名词短语:
Example: A Simple CFG for English and generating sentences 示例:英语的简单CFG和生成句子
-
终结符Terminal Symbols:
rat
,the
,ate
,cheese
-
非终结符 Non-terminal symbols:
S
,NP
,VP
,DT
,VBD
,NN
-
Productions: 产生式:
S -> NP VP
NP -> DT NN
VP -> VBD NP
DT -> the
NN -> rat
NN -> cheese
VBD -> ate
-
Generating Sentences with CFGs: 使用CFG生成句子:
CFG Trees CFG树
-
Generation corresponds to a syntactic tree 生成对应于句法树
-
Non-terminals are internal nodes 非终结符是内部节点
-
Terminals are leaves 终结符是叶节点
-
CFG parsing is the reverse process CFG解析是逆向过程
-
E.g.:
CYK Algorithm CYK算法
CYK Algorithm CYK算法
-
Bottom-up parsing 自底向上解析
-
Tests whether a string is valid given a CFG, without enumerating all possible parses 检查一个字符串是否符合给定的CFG,而无需列举所有可能的解析
-
Core idea: Form small constituents first, and merge them into larger constituents 核心思想:首先形成小的成分,然后将它们合并成更大的成分
-
Requirement: CFGs must be in Chomsky Normal Forms 要求:CFG必须在Chomsky正则形式中
Convert to Chomsky Normal Form 转化为乔姆斯基正则形式
-
Change grammar so all rules of form: 改变语法,使所有规则满足以下形式:
A -> B C
: Non-terminal LHS to two non-terminals RHS A -> B C:非终结符的左侧转化为两个非终结符的右侧A -> a
: Non-terminal LHS to one terminal RHS A -> a:非终结符的左侧转化为一个终结符的右侧
-
To meet requirements 为了满足要求
-
convert rules of form
A -> B c
into: 将形式A -> B c的规则转化为:A -> B X
andX -> c
-
convert rules of form
A -> B C D
into: 将形式A -> B C D的规则转化为:A -> B Y
andY -> C D
-
-
CNF disallows unary rules like
A -> B
to avoid infinite loops CNF不允许A -> B这样的一元规则以避免无限循环- Replace RHS non-terminal with its productions: 使用其产生式替换右侧的非终结符:
A -> B, B -> cat, B -> dog
->A -> cat, A -> dog
- Replace RHS non-terminal with its productions: 使用其产生式替换右侧的非终结符:
The CYK Parsing Algorithm CYK解析算法
-
Convert grammar to Chomsky Normal Form 将语法转化为乔姆斯基正则形式
-
Fill in a parse table, left to right, bottom to top 填充解析表格,从左到右,从下到上
-
Use table to derive parse 使用表格派生解析
-
S
in top right corner of table -> success S在表格的右上角 -> 成功 -
Convert result back to original grammar 将结果转化回原来的语法
-
E.g.
-
Retrieving the Parses 检索解析
-
S in the top-right corner of parse table indicates success 解析表的右上角的S表示成功
-
To get parses, follow pointer back for each match: 获取解析,回溯每个匹配的指针:
-
* If multiple solutions are available, all of the trees are valid: 如果有多个解决方案,所有的树都是有效的 》 ![在这里插入图片描述](https://img-blog.csdnimg.cn/d85ac20498a64472868f762f4fed3edc.png)
- Pseudo Code:
function CYK-Parse(words, grammar) returns table
for j <- from 1 to LENGTH(words) do
for all {A | A -> words[j] ∈ grammar}
table[j-1, j] <- table[j-1, j] ∪ A
for i <- from j-2 down to 0 do
for k <- i + 1 to j - 1 do
for all {A | A -> BC ∈ grammar and B ∈ table[i, k] and C ∈ table[k, j]
table[i, j] <- table[i, j] ∪ A
return table
Representing English with CFGs
From Toy Grammars to Real Grammars
-
Toy grammars with handful productions good for demonstration or extremely limited domains
-
For real texts, we need real grammars
-
Many thousands of production rules
Key Constituents in Penn Treebank
- Sentence
S
- Noun phrase
NP
- Verb phrase
VP
- Prepositional phrase
PP
- Adjective phrase
AdjP
- Adverbial phrase
AdvP
- Subordinate clause
SBAR
- E.g.
Basic English Sentence Structures
-
Declarative sentences
S -> NP VP
- The rat ate the cheese
-
Imperative sentences
S -> VP
- Eat the cheese
-
Yes/no questions
S -> VB NP VP
- Did the rat eat the cheese?
-
Wh-subject questions
S -> WH VP
- Who ate the cheese
-
Wh-object questions
S -> WH VB NP VP
- What did the rat eat?
English Noun Phrases
-
Pre-modifiers:
- DT, CD, ADJP, NNP, NN
- E.g.:
the two very best Philly cheese steaks
-
Post-modifiers:
- PP, VP, SBAR
- E.g.:
A delivery from Bob coming today that I don't want to miss
English Verb Phrases
-
Auxiliaries
- MD, AdvP, VB, TO
- E.g.:
should really have tried to wait
-
Arguments and adjuncts
- NP, PP, SBAR, VP, AdvP
- E.g.:
told him yesterday that I was ready
Other Constituents
-
Prepositional phrase:
PP -> IN NP
- E.g.:
in the house
-
Adjective phrase:
AdjP -> (AdvP) JJ
- E.g.:
really nice
-
Adverb phrase:
AdvP -> (AdvP) RB
not too well
-
Subordinate clause
SBAR -> (IN) S
- E.g.:
since I came here
-
Coordination
NP -> NP CC NP; VP -> VP CC VP
- E.g.:
Jack and Jill
-
Complex sentences
S -> S SBAR; S -> SBAR S
- E.g.:
if he goes, I'll go