Let’s Build A Simple Interpreter 4

最新推荐文章于 2022-02-16 13:35:36 发布

jinchengwu3344

最新推荐文章于 2022-02-16 13:35:36 发布

阅读量217

点赞数

本文链接：https://blog.csdn.net/longlongqin/article/details/105111547

版权

原文链接：https://ruslanspivak.com/lsbasi-part4/

在前面的文章中你学会了怎样识别和解释包含任意数量的加减操作的算术表达式，例如“7 - 3 + 2 - 1”。还学会了句法图以及它们如何被用来表示一门编程语言的语法。

今天你将会学习解析(parse)和解释(interpret)包含任意乘除操作的算术表达式，例如“7 * 4 / 2 * 3”。在这篇文章中使用的是整数除法，所以对于表达式“9 / 4”来说，结果是一个整数：2。

我今天会讲很多另一个表示编程语言句法的广泛使用的表示法，叫 上下文无关语法 (context-free grammars, 简记为 grammars)或 BNF (Backus-Naur Form)。为了这篇文章的目的，我不会使用纯 BNF 记法，而更像是一个修改过的 EBNF 记法。

文法(语法)：描述语言的语法结构的形式规则。

上下文无关语法就是说这个文法中所有的产生式左边只有一个非终结符，比如：

S -> aSb

S -> ab
这个文法有两个产生式，每个产生式左边只有一个非终结符S，这就是上下文无关文法，因为你只要找到符合产生式右边的串，就可以把它归约为对应的非终结符。

比如：

aSb -> aaSbb

S -> ab
这就是上下文相关文法，因为它的第一个产生式左边有不止一个符号，所以你在匹配这个产生式中的S的时候必需确保这个S有正确的“上下文”，也就是左边的a和右边的b，所以叫上下文相关文法。

作者：徐辰
链接：https://www.zhihu.com/question/21833944/answer/40689967
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

以下是一些使用语法的原因：

语法使用了一种简明的方式来描述一门编程语言的句法。不像语法图，语法非常紧凑。在以后的文章中，你会看到我越来越多地使用语法。
语法可以做为文档保存。
即使对从头开始写解析器(parser)来说，语法也是一个好的入手点。很多时候通过遵循一套简单的规则你就可以把语法转化成代码。
有一套工具，叫解析器生成器(parser generator)，可以把语法做为输入并自动根据它为你生成一个解析器。我会以后在这个系列中谈到这些工具。

1
2

def factor(self):   #factor就是指的integer型数值
    self.eat(INTEGER)

expr 规则变成了 expr 方法（还是准则1）。规则体(body)开始的 factor 引用变成了对 factor() 方法的调用。可行组 (...)* 变成了一个 while 循环，多选一 (MUL|DIV) 变成了一个 if-elif-else 语句。把这些片段合并在一起就得到了下面的expr 方法：

def expr(self):
    self.factor() #语法准则中的第一个factor

    while self.current_token.type in (MUL, DIV):
        token = self.current_token
        if token.type == MUL:
            self.eat(MUL)
            self.factor()
        elif token.type == DIV:
            self.eat(DIV)
            self.factor()

原作者将本文的代码放在了文件 parser.py 中，它包含了 lexer 和 parser 但没有interpreter。你可以直接从 GitHub下载并尝试一下。它包含有一个 interpreter 提示符，你可以输入表达式来查看它是否合法，即查看根据语法建立的 parser 是否可以识别出表达式。

下面是在我笔记本上的一次尝试：

$ python parser.py
calc> 3
calc> 3 * 7
calc> 3 * 7 / 2
calc> 3 *
Traceback (most recent call last):
  File "parser.py", line 155, in <module>
    main()
  File "parser.py", line 151, in main
    parser.parse()
  File "parser.py", line 136, in parse
    self.expr()
  File "parser.py", line 130, in expr
    self.factor()
  File "parser.py", line 114, in factor
    self.eat(INTEGER)
  File "parser.py", line 107, in eat
    self.error()
  File "parser.py", line 97, in error
    raise Exception('Invalid syntax')
Exception: Invalid syntax

这里再次提起语法图。这是相同的 expr 规则（也叫，产生式(production)）对应的句法图：

expr

下面是原作者的关于本文的源码。下面是可以处理包含任意数量整数乘除（整数除法）操作的合法的算术表达式的计算器代码。这里把词法分析器重构到了一个单独的类 Lexer 中，并让 Interpreter 类使用 Lexer 实例做为参数：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, MUL, DIV, EOF = 'INTEGER', 'MUL', 'DIV', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, MUL, DIV, or EOF
        self.type = type
        # token value: non-negative integer value, '*', '/', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(MUL, '*')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()

#词法分析
class Lexer(object):
    def __init__(self, text):
        # client string input, e.g. "3 * 5", "12 / 3 * 4", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception('Invalid character')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)
	
    #不如叫做：get_token_and_next
    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '*':
                self.advance()
                return Token(MUL, '*')

            if self.current_char == '/':
                self.advance()
                return Token(DIV, '/')

            self.error()

        return Token(EOF, None)


class Interpreter(object):
    def __init__(self, lexer):
        self.lexer = lexer
        # set current token to the first token taken from the input
        self.current_token = self.lexer.get_next_token()

    def error(self):
        raise Exception('Invalid syntax')

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.lexer.get_next_token()
        else:
            self.error()

    def factor(self):
        """Return an INTEGER token value.

        factor : INTEGER
        """
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter.

        expr   : factor ((MUL | DIV) factor)*
        factor : INTEGER
        """
        result = self.factor()

        while self.current_token.type in (MUL, DIV):
            token = self.current_token
            if token.type == MUL:
                self.eat(MUL)
                result = result * self.factor()
            elif token.type == DIV:
                self.eat(DIV)
                result = result / self.factor()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        lexer = Lexer(text)
        interpreter = Interpreter(lexer)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

将以上代码保存到名为 calc4.py 中，或者直接从 GitHub 上下载。和以往一样，自己尝试一下，确认它能工作。