【Interpreter】构建简单的解释器（第3部分）

最新推荐文章于 2024-03-03 22:09:21 发布

唐茂

最新推荐文章于 2024-03-03 22:09:21 发布

阅读量3.7k

点赞数 1

分类专栏：构建简单的解释器(译) 文章标签： interpreter 构建简单的解释器构建简单的解释器第3部分 Ruslan's Blog

● 【计算机理论和基础】同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

构建简单的解释器(译)

12 篇文章 0 订阅

订阅专栏

文章目录

【Interpreter】构建简单的解释器（第3部分）

【Interpreter】构建简单的解释器（第3部分）

简单翻译了下，方便查看，水平有限，喜欢的朋友去看原文！

早上醒来的时候我心想：“为什么我们觉得学习一种新的技能很困难呢？”

我认为这并不仅仅是因为新技能需要很辛苦很努力的去学习。我认为其中一个原因可能是我们花了很多时间和精力通过阅读和观看来获取知识，而没有足够的时间通过练习将这些知识转化为技能。以游泳为例，你可以花很多时间阅读许多关于游泳的书籍，与经验丰富的游泳运动员和教练聊很久，观看所有能接触到的训练视频，但是，当你第一次跳进游泳池时，仍然会像一块石头一样下沉。

底线：不管你觉得自己有多么了解一个专题 - 你必须将这些知识付诸实践，来将其转化为技能。为了帮助你用实践方式学习，我在这一系列中设置了 Part 1 和 Part 2 两个部分的练习。对了，我保证你会在今天和以后的文章中看到更多练习 ?

好的，让我们开始今天的课程，好吗？

到目前为止，你已经学会了如何解释两个整数的加法和减法运算，如 “7 + 3” 或 “12 - 9”。今天我将讨论如何解析（识别）和解释包含任意数量的加号或减号运算符的算术表达式，例如 “7 - 3 + 2 - 1”。

用图形表示的话，本文中的算术表达式可以用以下语法图表示：

lsbasi_part3_syntax_diagram

什么是语法图？ 语法图 是编程语言的语法规则的图形表示。基本上，语法图可以直观地显示编程语言中允许使用哪些语句，哪些不被允许。

语法图非常容易阅读：只需按照箭头指示的路径即可。一些路径表示选择。有些路径表示循环。

你可以按如下方式阅读上面的语法图：一个 term 可选地后跟加号或减号，后跟另一个 term，后者可选地后跟加号或减号后跟另一个 term，依此类推。你可以从字面意思上理解图片。你可能想知道什么是 “term”。根据本文的意思，“term” 只是一个整数。

语法图有两个主要用途：

它们以图形方式表示编程语言的规范（语法）；
它们可用于帮助您编写解析器 - 您可以通过遵循简单的规则将图表映射到代码。

你已经了解到识别 token 流中的短语的过程称为解析。执行该作业的解释器或编译器的一部分称为解析器。解析也称为语法分析，解析器也叫做 — 语法分析器。

根据上面的语法图，以下所有算术表达式都是有效的：

3
3 + 4
7 - 3 + 2 - 1

因为不同编程语言中的算术表达式的语法规则非常相似，所以我们可以使用 Python shell 来 “测试” 我们的语法图。启动你的 Python shell 亲自看看：

>>> 3
3
>>> 3 + 4
7
>>> 7 - 3 + 2 - 1
5

没啥特别的内容。

表达式 “3 +” 不是合法的算术表达式，因为根据语法图，加号后面必须跟一个 term（整数），否则就是语法错误。使用 Python shell 亲自试试看看：

>>> 3 +
  File "<stdin>", line 1
    3 +
      ^
SyntaxError: invalid syntax

很高兴能够使用 Python shell 进行一些测试，还是让我们将上面的语法图实现为代码，并使用我们自己的解释器进行测试吧。

从前面的文章（Part.1和Part.2）中了解到，expr 方法是我们的解析器和解释器工作的地方。解析器只识别结构，确保它符合规范，解释器会在解析器成功识别（解析）结构之后，计算出表达式的值。

下面代码片段是对应语法图的解析器代码。语法图（term）中的矩形框成为解析整数的 term 方法，expr方法只负责语法图总体流程：

def term(self):
    self.eat(INTEGER)

def expr(self):
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            self.term()

您可以看到 expr 首先调用 term方法。然后 expr方法包含一个可以执行任意次的 while 循环。在循环内部，解析器根据 token（加号或者减号）做出判断。花一些时间证明上面的代码确实遵循算术表达式的语法图流程。

解析器本身并不解释任何东西：如果正确识别出表达式就继续正常运行，否则它会抛出语法错误。让我们修改expr方法并添加解释器代码：

def term(self):
    """Return an INTEGER token value"""
    token = self.current_token
    self.eat(INTEGER)
    return token.value

def expr(self):
    """Parser / Interpreter """
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    result = self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            result = result + self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            result = result - self.term()

    return result

因为解释器需要计算表达式的值，所以修改 term方法用来返回整数值，并修改 expr方法用于在适当的位置执行加法和减法运算并返回解释结果。即使代码非常简单直观，我还是建议花一些时间研究它。

让我们现在开始，看看解释器的完整代码。

以下是新版计算器的源代码，它可以处理包含整数和任意数量的加法和减法运算符的有效算术表达式：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5 + 3", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    ##########################################################
    # Lexer code                                             #
    ##########################################################
    def error(self):
        raise Exception('Invalid syntax')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    ##########################################################
    # Parser / Interpreter code                              #
    ##########################################################
    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def term(self):
        """Return an INTEGER token value."""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter."""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        result = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

将以上代码保存到 calc3.py 文件中或直接从 GitHub 下载。试试看它能否处理之前展示的语法图中得到的算术表达式。

下面是我在笔记本电脑上运行的示例：

$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
  File "calc3.py", line 147, in <module>
    main()
  File "calc3.py", line 142, in main
    result = interpreter.expr()
  File "calc3.py", line 123, in expr
    result = result + self.term()
  File "calc3.py", line 110, in term
    self.eat(INTEGER)
  File "calc3.py", line 105, in eat
    self.error()
  File "calc3.py", line 45, in error
    raise Exception('Invalid syntax')
Exception: Invalid syntax

还记得我在文章开头提到的那些练习：现在正是兑现承诺的时候了 ?

lsbasi_part3_exercises

画出仅包含乘法和除法的算术表达式的语法图，例如 “7 * 4 / 2 * 3”。试着画一画。
修改上面计算器的源代码以解释仅包含乘法和除法的算术表达式，例如 “7 * 4 / 2 * 3”。
编写一个可以处理像 “7 - 3 + 2 - 1” 这样的算术表达式的解释器。你可以使用熟悉的任何编程语言，并且不要看实例。编写过程中考虑所涉及的几个关键组件：一个可以接受输入并将其转换为 token 流的词法分析器、一个从词法分析器提供的标记流中提取并尝试识别该流中结构的解析器、以及一个在解析器成功解析（识别）有效算术表达式后生成结果的解释器。将这些串在一起，花一些时间编出可以计算算术表达式的解释器。

理解测试

什么是语法图？
什么是语法分析？
什么是语法分析器？

你已经读到了最后，感谢阅读！不要忘记做练习:)

下次我会带来新文章，请继续关注！

以下是我推荐的书籍清单，可以帮助您学习解释器和编译器：

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)
Writing Compilers and Interpreters: A Software Engineering Approach
Modern Compiler Implementation in Java
Modern Compiler Design
Compilers: Principles, Techniques, and Tools (2nd Edition)

原文链接：Let’s Build A Simple Interpreter. Part 3.

作者博客：Ruslan’s Blog

——2019-01-08——