上下文无关文法和正则文法_什么是上下文无关文法?

上下文无关文法和正则文法

Have you ever noticed that, when you are writing code in a text editor like VS code, it recognizes things like unmatched braces? And it also sometimes warns you, with an irritating red highlight, about the incorrect syntax that you have written?

您是否曾经注意到,当您在像VS代码这样的文本编辑器中编写代码时,它会识别出不匹配的花括号之类的东西吗? 有时它还会以红色高亮警告您有关您编写的语法错误的警告?

If not, then think about it. That is after all a piece of code. How can you write code for such a task? What would be the underlying logic behind it?

如果没有,请考虑一下。 那毕竟是一段代码。 您如何为此类任务编写代码? 其背后的潜在逻辑是什么?

These are the kinds of questions that you will face if you have to write a compiler for a programming language. Writing a compiler is not an easy task. It is bulky job that demands a significant amount of time and effort.

如果您必须为编程语言编写编译器,则将遇到这些问题。 编写编译器并非易事。 这项繁琐的工作需要大量的时间和精力。

In this article, we are not going to talk about how to build compilers. But we will talk about a concept that is a core component of the compiler: Context Free Grammars.

在本文中,我们将不讨论如何构建编译器。 但是,我们将讨论作为编译器核心组件的概念:上下文无关文法。

介绍 (Introduction)

All the questions we asked earlier represent a problem that is significant to compiler design called Syntax Analysis. As the name suggests, the challenge is to analyze the syntax and see if it is correct or not. This is where we use Context Free Grammars. A Context Free Grammar is a set of rules that define a language.

我们之前提出的所有问题都代表着一个对编译器设计非常重要的问题,称为语法分析。 顾名思义,挑战在于分析语法并查看其是否正确。 这是我们使用上下文无关文法的地方。 上下文无关语法是定义语言的一组规则。

Here, I would like to draw a distinction between Context Free Grammars and grammars for natural languages like English.

在这里,我想区分上下文无关文法和自然语言(如英语)的语法之间的区别。

Context Free Grammars or CFGs define a formal language. Formal languages work strictly under the defined rules and their sentences are not influenced by the context. And that's where it gets the name context free.

上下文无关文法或CFG定义了一种正式语言。 形式语言严格按照定义的规则工作,其句子不受上下文的影响。 这就是免费获取名称上下文的地方。

Languages such as English fall under the category of Informal Languages since they are affected by context. They have many other features which a CFG cannot describe.

诸如英语之类的语言属于非正式语言,因为它们受上下文的影响。 它们具有CFG无法描述的许多其他功能。

Even though CFGs cannot describe the context in the natural languages, they can still define the syntax and structure of sentences in these languages. In fact, that is the reason why the CFGs were introduced in the first place.

即使CFG无法用自然语言描述上下文,它们仍然可以使用这些语言定义句子的语法和结构。 实际上,这就是为什么首先引入CFG的原因。

In this article we will attempt to generate English sentences using CFGs. We will learn how to describe the sentence structure and write rules for it. To do this, we will use a JavaScript library called Tracery which will generate sentences on the basis of rules we defined for our grammar.

在本文中,我们将尝试使用CFG生成英语句子。 我们将学习如何描述句子结构并为其编写规则。 为此,我们将使用一个称为TraceryJavaScript库,该库将根据我们为语法定义的规则生成句子。

Before we dive into the code and start writing the rules for the grammar, let's just discuss some basic terms that we will use in our CFG.

在深入研究代码并开始编写语法规则之前,我们仅讨论一些将在CFG中使用的基本术语。

Terminals: These are the characters that make up the actual content of the final sentence. These can include words or letters depending on which of these is used as the basic building block of a sentence.

终端 :这些字符组成了最后一句话的实际内容。 这些可以包括单词或字母,具体取决于将哪个单词或字母用作句子的基本组成部分。

In our case we will use words as the basic building blocks of our sentences. So our terminals will include words such as "to", "from", "the", "car", "spaceship", "kittens" and so on.

在我们的案例中,我们将单词作为句子的基本组成部分。 因此,我们的航站楼将包含诸如“至”,“来自”,“该”,“汽车”,“太空飞船”,“小猫”等词语。

Non Terminals: These are also called variables. These act as a sub language within the language defined by the grammar. Non terminals are placeholders for the terminals. We can use non terminals to generate different patterns of terminal symbols.

非终端 :也称为变量。 它们在语法定义的语言中充当副语言。 非终端是终端的占位符。 我们可以使用非终端来生成终端符号的不同模式。

In our case we will use these Non terminals to generate noun phrases, verb phrases, different nouns, adjectives, verbs and so on.

在本例中,我们将使用这些Non终端生成名词短语,动词短语,不同的名词,形容词,动词等。

Start Symbol: a start symbol is a special non terminal that represents the initial string that will be generated by the grammar.

起始符号 :起始符号是一个特殊的非终结符,表示将由语法生成的初始字符串。

Now that we know the terminology let's start learning about the grammatical rules.

现在我们已经知道了术语,让我们开始学习语法规则。

While writing grammar rules, we will start by defining the set of terminals and a start state. As we learned before, that start symbol is a non-terminal. This means it will belong to the set of non-terminals.

在编写语法规则时,我们将从定义一组终端和一个开始状态开始。 正如我们之前所了解的,该开始符号是一个非终止符。 这意味着它将属于非终端集合。

T: ("Monkey", "banana", "ate", "the")
S: Start state.

And the rules are:

规则是:

S --> nounPhrase verbPhrase
nounPhrase --> adj nounPhrase | adj noun
verbPhrase --> verb nounPhrase
adjective  --> the
noun --> Monkey | banana
verb --> ate

The above grammatical rules may seem somewhat cryptic at first. But if we look carefully, we can see a pattern that is being generated out of these rules.

上面的语法规则乍看起来似乎有些神秘。 但是,如果我们仔细观察,就会发现这些规则正在产生一种模式。

A better way to think about the above rules is to visualise them in the form of a tree structure. In that tree we can put S in the root and nounPhrase and verbPhrase can be added as children of the root. We can proceed in the same way with nounPhrase and verbPhrase too. The tree will have terminals as its leaf nodes because that is where we end these derivations.

考虑上述规则的一种更好的方法是以树结构的形式可视化它们。 在那棵树中,我们可以将S放在根中,然后可以将名词短语动词短语添加为根的子代。 我们也可以对名词短语动词短语进行相同的处理。 该树将以终端作为其叶节点,因为这是我们结束这些派生的地方。

In the above image we can see that S (a nonterminal)  derives two non terminals NP(nounPhrase) and VP(verbPhrase). In the case of NP, it has derived two non terminals, Adj and Noun.

在上图中,我们可以看到S (一个非终结符)派生了两个非终结符NP ( 名词短语 )和VP ( 动词短语 )。 在NP的情况下,它派生了两个非终结符AdjNoun

If you look at the grammar, NP could also have chosen Adj and nounPhrase. While generating text, these choices are made randomly.

如果您查看语法, NP也可能选择了AdjnounPhrase 。 在生成文本时,这些选择是随机进行的。

And finally the leaf nodes have terminals which are written in the bold text. So if you move from left to right, you can see that a sentence is formed.

最后,叶节点具有以粗体文字显示的终端。 因此,如果从左向右移动,则可以看到形成了一个句子。

The term often used for this tree is a Parse Tree. We can create another parse tree for a different sentence generated by this grammar in a similar way.

通常用于此树的术语是解析树。 我们可以用类似的方式为该语法生成​​的不同句子创建另一个分析树。

Now let's proceed further to the code. As I mentioned earlier, we will use a JavaScript library called Tracery for text generation using CFGs. We will also write some code in HTML and CSS for the front-end part.

现在,让我们继续进行代码。 正如我前面提到的,我们将使用一个称为TraceryJavaScript库来使用CFG生成文本。 我们还将在前端部分用HTML和CSS编写一些代码。

代码 (The Code)

Let's start by first getting the tracery library. You can clone the library from GitHub here. I have also left the link to the GitHub repository by galaxykate at the end of the article.

让我们首先获取窗饰库。 您可以在此处从GitHub克隆该库。 我还在文章结尾处留下了galaxykate到GitHub存储库的链接。

Before we use the library we will have to import it. We can do this simply in an HTML file like this.

在使用库之前,我们必须先导入它。 我们可以简单地在这样HTML文件中执行此操作。

<html>
    <head>
        <script src="tracery-master/js/vendor/jquery-1.11.2.min.js"></script>
		<script src="tracery-master/tracery.js"></script>
		<script src="tracery-master/js/grammars.js"></script>
        <script src='app.js'></script>
    </head>
    
</html>

I have added the cloned tracery file as a script in my HTML code. We will also have to add JQuery to our code because tracery depends on JQuery. Finally, I have added app.js which is the file where I will add rules for the grammar.

我已将克隆的窗饰文件作为脚本添加到我HTML代码中。 我们还必须将JQuery添加到我们的代码中,因为窗饰取决于JQuery。 最后,我添加了app.js ,这是我将在其中添加语法规则的文件。

Once that is done, create a JavaScript file where we will define our grammar rules.

完成此操作后,创建一个JavaScript文件,在其中定义语法规则。

var rules = {
    	"start": ["#NP# #VP#."],
    	"NP": ["#Det# #N#", "#Det# #N# that #VP#", "#Det# #Adj# #N#"],
    	"VP": ["#Vtrans# #NP#", "#Vintr#"],
    	"Det": ["The", "This", "That"],
    	"N": ["John Keating", "Bob Harris", "Bruce Wayne", "John Constantine", "Tony Stark", "John Wick", "Sherlock Holmes", "King Leonidas"],
    	"Adj": ["cool", "lazy", "amazed", "sweet"],
    	"Vtrans": ["computes", "examines", "helps", "prefers", "sends", "plays with", "messes up with"],
    	"Vintr": ["coughs", "daydreams", "whines", "slobbers", "appears", "disappears", "exists", "cries", "laughs"]
    }

Here you will notice that the syntax for defining rules is not much different from how we defined our grammar earlier. There are very minor differences such as the way non-terminals are defined between the hash symbols. And also the way in which different derivations are written. Instead of using the "|" symbol for separating them, here we will put all the different derivations as different elements of an array. Other than that, we will use the semicolons instead of arrows to represent the transition.

在这里,您会注意到定义规则的语法与我们之前定义语法的方式没有太大不同。 有一些细微的差异,例如在哈希符号之间定义非终结符的方式。 以及编写不同派生方式的方式。 代替使用“ |” 符号,将它们分开,在这里,我们将所有不同的导数作为数组的不同元素放置。 除此之外,我们将使用分号代替箭头来表示过渡。

This new grammar is a little more complicated than the one we defined earlier. This one includes many other things such as Determiners, Transitive Verbs and Intransitive Verbs. We do this to make the generated text look more natural.

这种新语法比我们之前定义的语法复杂一些。 这包括许多其他事物,例如限定词,及物动词和不及物动词。 我们这样做是为了使生成的文本看起来更自然。

Let's now call the tracery function "createGrammar" to create the grammar we just defined.

现在让我们调用窗饰函数“ createGrammar”来创建我们刚刚定义的语法。

let grammar = tracery.createGrammar(rules);

This function will take the rules object and generate a grammar on the basis of these rules. After creating the grammar, we now want to generate some end result from it. To do that we will use a function called "flatten".

该函数将使用规则对象并在这些规则的基础上生成语法。 创建语法后,我们现在要从中生成一些最终结果。 为此,我们将使用一个名为“ flatten”的函数。

let expansion = grammar.flatten('#start#');

It will generate a random sentence based on the rules that we defined earlier. But let's not stop there. Let's also build a user interface for it. There's not much we will have to do for that part – we just need a button and some basic styles for the interface.

它将根据我们之前定义的规则生成一个随机句子。 但是,我们不要就此止步。 我们还为其构建一个用户界面。 在那部分,我们不需要做太多的事情,我们只需要一个按钮和一些界面的基本样式即可。

In the same HTML file where we added the libraries we will add some elements.

在添加库的同一HTML文件中,我们将添加一些元素。

<html>
    <head>
        <title>Weird Sentences</title>
        <link rel="stylesheet" href="style.css"/>
        <link href="https://fonts.googleapis.com/css?family=UnifrakturMaguntia&display=swap" rel="stylesheet">
        <link href="https://fonts.googleapis.com/css?family=Harmattan&display=swap" rel="stylesheet">
        
        <script src="tracery-master/js/vendor/jquery-1.11.2.min.js"></script>
		<script src="tracery-master/tracery.js"></script>
		<script src="tracery-master/js/grammars.js"></script>
        <script src='app.js'></script>
    </head>
    <body>
        <h1 id="h1">Weird Sentences</h1>
        <button id="generate" onclick="generate()">Give me a Sentence!</button>
        <div id="sentences">

        </div>
    </body>
</html>

And finally we will add some styles to it.

最后,我们将为其添加一些样式。

body {
    text-align: center;
    margin: 0;
    font-family: 'Harmattan', sans-serif;
}

#h1 {
    font-family: 'UnifrakturMaguntia', cursive;
    font-size: 4em;
    background-color: rgb(37, 146, 235);
    color: white;
    padding: .5em;
    box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);
}

#generate {
    font-family: 'Harmattan', sans-serif;
    font-size: 2em;
    font-weight: bold;
    padding: .5em;
    margin: .5em;
    box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);
    background-color: rgb(255, 0, 64);
    color: white;
    border: none;
    border-radius: 2px;
    outline: none;
}

#sentences p {
    box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);
    margin: 2em;
    margin-left: 15em;
    margin-right: 15em;
    padding: 2em;
    border-radius: 2px;
    font-size: 1.5em;
}

We will also have to add some more JavaScript to manipulate the interface.

我们还必须添加更多JavaScript来操纵接口。

let sentences = []
function generate() {
    var data = {
    	"start": ["#NP# #VP#."],
    	"NP": ["#Det# #N#", "#Det# #N# that #VP#", "#Det# #Adj# #N#"],
    	"VP": ["#Vtrans# #NP#", "#Vintr#"],
    	"Det": ["The", "This", "That"],
    	"N": ["John Keating", "Bob Harris", "Bruce Wayne", "John Constantine", "Tony Stark", "John Wick", "Sherlock Holmes", "King Leonidas"],
    	"Adj": ["cool", "lazy", "amazed", "sweet"],
    	"Vtrans": ["computes", "examines", "helps", "prefers", "sends", "plays with", "messes up with"],
    	"Vintr": ["coughs", "daydreams", "whines", "slobbers", "appears", "disappears", "exists", "cries", "laughs"]
    }
    
    let grammar = tracery.createGrammar(data);
    let expansion = grammar.flatten('#start#');

    sentences.push(expansion);

    printSentences(sentences);
}

function printSentences(sentences) {
    let textBox = document.getElementById("sentences");
    textBox.innerHTML = "";
    for(let i=sentences.length-1; i>=0; i--) {
        textBox.innerHTML += "<p>"+sentences[i]+"</p>"
    }
}

Once you have finished writing the code, run your HTML file. It should look something like this.

完成编写代码后,运行HTML文件。 它看起来应该像这样。

Every time you click the red button it will generate a sentence. Some of these sentences might not make any sense. This is because, as I said earlier, CFGs cannot describe the context and some other features that natural languages possess. It is used only to define the syntax and structure of the sentences.

每次单击红色按钮,都会生成一个句子。 其中一些句子可能没有任何意义。 正如我前面所说,这是因为CFG无法描述自然语言所具有的上下文和其他某些功能。 它仅用于定义句子的语法和结构。

You can check out the live version of this here.

您可以在此处查看此版本的实时版本。

结论 (Conclusion)

If you have made it this far, I highly appreciate your resilience. It might be a new concept for some of you, and others might have learnt about it in their college courses. But still, Context Free Grammars have interesting applications that range widely from Computer Science to Linguistics.

如果您能做到这一点,我非常感谢您的应变能力。 对于您中的某些人来说,这可能是一个新概念,而其他人可能已经在大学课程中学到了这一概念。 但是,上下文无关文法仍然具有有趣的应用,范围从计算机科学到语言学。

I have tried my best to present the main ideas of CFGs here, but there is a lot more that you can learn about them. Here I have left links to some great resources:

我已经尽力在这里介绍CFG的主要思想,但是您可以了解到更多的知识。 在这里,我留下了一些重要资源的链接:

翻译自: https://www.freecodecamp.org/news/context-free-grammar/

上下文无关文法和正则文法

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值