Lecture 14 Context-Free Grammar

Context-Free Grammar 上下文无关语法

Basics of Context-Free Grammars 上下文无关语法基础

  • Symbols: 符号

    • Terminal: word such as book 终结符:如单词book
    • Non-terminal: syntactic label such as NP and VP 非终结符:如句法标签NP和VP
    • convention: 规约
      • lowercase for terminals 使用小写表示终结符
      • uppercase for non-terminals 使用大写表示非终结符
  • Productions: 产生式:

    • W -> X Y Z
    • Exactly one non-terminal on the LHS 左侧恰好有一个非终结符
    • An ordered list of symbols on the RHS, can be terminals or non-terminals 右侧为符号的有序列表,可以是终结符或非终结符
  • Start symbol: S 开始符号:S

  • Context-Free: 上下文无关:

    • Production rules depends only on the LHS, not on ancestors, neighbors 产生规则仅依赖于左侧,不依赖于祖先节点,邻居节点
      • Analogous to Markov chain 类似于马尔可夫链
      • Behaviour at each steps depends only on the current state 每一步的行为仅依赖于当前状态
    • Context-Free languages more general than regular languages. Allows recursive nesting 上下文无关语言比正规语言更通用。允许递归嵌套

CFG Parsing CFG解析

  • Given production rules: E.g. 给定产生规则:例如

    • S -> a S b
    • S -> a b
  • And a string: aaabbb 以及一个字符串:aaabbb

  • Produce a valid parse tree: 生成有效的解析树:

    在这里插入图片描述

  • If English can be represented with CFG: 如果英语可以用CFG表示:

    • First develop the production rules 首先开发产生规则
    • Can then build a parser to automatically judge whether a sentence is grammatical 然后可以构建一个解析器来自动判断一个句子是否符合语法
  • CFG strike a good balance: CFG取得了良好的平衡:

    • CFG covers most syntactic patterns CFG覆盖了大多数语法模式
    • CFG parsing in computational efficient CFG解析在计算上是高效的

Constituents 成分

Syntactic Constituents 句法成分

  • Sentences are broken into constituents 句子被分解成各个成分

    • Word sequence that function as a coherent unit for linguistic analysis 作为语言分析的连贯单元的词序列
    • Helps build CFG production rules 帮助构建CFG产生规则
  • Constituents have certain key properties: 成分有一些关键属性:

    • movement: Constituents can be moved around sentences: 移位:成分可以在句子中移动位置:
      • Abigail gave [her brother] [a fish]
      • Abigail gave [a fish] to [her brother]
      • Contrast: [gave her] and [brother a]
    • substitution: Constituents can be substituted by other phrases of the same type: 替换:成分可以被同类型的其他短语替换:
      • Max thanked [his older sister]
      • Max thanked [her]
      • Contrast: [Max thanked] and [thanked his]
    • coordination: Constituents can be conjoined with other coordinators like and and or" 连词:成分可以与and和or等连词结合:
      • [Abigail] and [her young brother] brought a fish
      • Abigail [bought a fish] and [gave it to Max]
      • Abigail [bought] and [greedily ate] a fish

Constituents and Phrases 成分和短语

  • Once identify constituents, use phrases to describe them 一旦确定了成分,使用短语来描述它们

  • Phrases are determined by their head word: 短语由其头词确定:

    • Noun phrase 名词短语: her younger brother
    • Verb phrase 动词短语: greedily ate it

Example: A Simple CFG for English and generating sentences 示例:英语的简单CFG和生成句子

  • 终结符Terminal Symbols: rat, the, ate, cheese

  • 非终结符 Non-terminal symbols: S, NP, VP, DT, VBD, NN

  • Productions: 产生式:

    • S -> NP VP
    • NP -> DT NN
    • VP -> VBD NP
    • DT -> the
    • NN -> rat
    • NN -> cheese
    • VBD -> ate
  • Generating Sentences with CFGs: 使用CFG生成句子:

    在这里插入图片描述

CFG Trees CFG树

  • Generation corresponds to a syntactic tree 生成对应于句法树

  • Non-terminals are internal nodes 非终结符是内部节点

  • Terminals are leaves 终结符是叶节点

  • CFG parsing is the reverse process CFG解析是逆向过程

  • E.g.:

    在这里插入图片描述

CYK Algorithm CYK算法

CYK Algorithm CYK算法

  • Bottom-up parsing 自底向上解析

  • Tests whether a string is valid given a CFG, without enumerating all possible parses 检查一个字符串是否符合给定的CFG,而无需列举所有可能的解析

  • Core idea: Form small constituents first, and merge them into larger constituents 核心思想:首先形成小的成分,然后将它们合并成更大的成分

  • Requirement: CFGs must be in Chomsky Normal Forms 要求:CFG必须在Chomsky正则形式中

Convert to Chomsky Normal Form 转化为乔姆斯基正则形式

  • Change grammar so all rules of form: 改变语法,使所有规则满足以下形式:

    • A -> B C: Non-terminal LHS to two non-terminals RHS A -> B C:非终结符的左侧转化为两个非终结符的右侧
    • A -> a: Non-terminal LHS to one terminal RHS A -> a:非终结符的左侧转化为一个终结符的右侧
  • To meet requirements 为了满足要求

    • convert rules of form A -> B c into: 将形式A -> B c的规则转化为:

      • A -> B X and X -> c
    • convert rules of form A -> B C D into: 将形式A -> B C D的规则转化为:

      • A -> B Y and Y -> C D
  • CNF disallows unary rules like A -> B to avoid infinite loops CNF不允许A -> B这样的一元规则以避免无限循环

    • Replace RHS non-terminal with its productions: 使用其产生式替换右侧的非终结符:
      • A -> B, B -> cat, B -> dog -> A -> cat, A -> dog

The CYK Parsing Algorithm CYK解析算法

  • Convert grammar to Chomsky Normal Form 将语法转化为乔姆斯基正则形式

  • Fill in a parse table, left to right, bottom to top 填充解析表格,从左到右,从下到上

  • Use table to derive parse 使用表格派生解析

  • S in top right corner of table -> success S在表格的右上角 -> 成功

  • Convert result back to original grammar 将结果转化回原来的语法

  • E.g.

    在这里插入图片描述

  • Retrieving the Parses 检索解析

    • S in the top-right corner of parse table indicates success 解析表的右上角的S表示成功

    • To get parses, follow pointer back for each match: 获取解析,回溯每个匹配的指针:

      在这里插入图片描述


* If multiple solutions are available, all of the trees are valid: 如果有多个解决方案,所有的树都是有效的 》 ![在这里插入图片描述](https://img-blog.csdnimg.cn/d85ac20498a64472868f762f4fed3edc.png)
  • Pseudo Code:
function CYK-Parse(words, grammar) returns table
    for j <- from 1 to LENGTH(words) do
        for all {A | A -> words[j] ∈ grammar}
            table[j-1, j] <- table[j-1, j] ∪ A
        for i <- from j-2 down to 0 do
            for k <- i + 1 to j - 1 do
                for all {A | A -> BC ∈ grammar and B ∈ table[i, k] and C ∈ table[k, j]
                    table[i, j] <- table[i, j] ∪ A
                    
    return table

Representing English with CFGs

From Toy Grammars to Real Grammars

  • Toy grammars with handful productions good for demonstration or extremely limited domains

  • For real texts, we need real grammars

  • Many thousands of production rules

Key Constituents in Penn Treebank

  • Sentence S
  • Noun phrase NP
  • Verb phrase VP
  • Prepositional phrase PP
  • Adjective phrase AdjP
  • Adverbial phrase AdvP
  • Subordinate clause SBAR
  • E.g.

    在这里插入图片描述

Basic English Sentence Structures

  • Declarative sentences S -> NP VP

    • The rat ate the cheese
  • Imperative sentences S -> VP

    • Eat the cheese
  • Yes/no questions S -> VB NP VP

    • Did the rat eat the cheese?
  • Wh-subject questions S -> WH VP

    • Who ate the cheese
  • Wh-object questions S -> WH VB NP VP

    • What did the rat eat?

English Noun Phrases

  • Pre-modifiers:

    • DT, CD, ADJP, NNP, NN
    • E.g.: the two very best Philly cheese steaks
  • Post-modifiers:

    • PP, VP, SBAR
    • E.g.: A delivery from Bob coming today that I don't want to miss

English Verb Phrases

  • Auxiliaries

    • MD, AdvP, VB, TO
    • E.g.: should really have tried to wait
  • Arguments and adjuncts

    • NP, PP, SBAR, VP, AdvP
    • E.g.: told him yesterday that I was ready

Other Constituents

  • Prepositional phrase:

    • PP -> IN NP
    • E.g.: in the house
  • Adjective phrase:

    • AdjP -> (AdvP) JJ
    • E.g.: really nice
  • Adverb phrase:

    • AdvP -> (AdvP) RB
    • not too well
  • Subordinate clause

    • SBAR -> (IN) S
    • E.g.: since I came here
  • Coordination

    • NP -> NP CC NP; VP -> VP CC VP
    • E.g.: Jack and Jill
  • Complex sentences

    • S -> S SBAR; S -> SBAR S
    • E.g.: if he goes, I'll go
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值