如何写一个简单的解释器（Interpreter）-5

最新推荐文章于 2024-08-22 09:41:51 发布

Runtime Error

最新推荐文章于 2024-08-22 09:41:51 发布

阅读量206

点赞数

分类专栏： compiler interpreter

本文链接：https://blog.csdn.net/weixin_38151747/article/details/85786095

版权

compiler 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

interpreter

15 篇文章 1 订阅

订阅专栏

当你想弄明白一个复杂的系统，比如说interpreter或者是编译器。一开始它看起来好像是一团杂乱无章的毛球，你需要把这个毛球的线都拆出来，重新组成一个光滑的毛球。

这个过程可以是单线程的，也就是说每次你只拿一根线，去打一个结。有时候你会觉得你有很多东西不懂，但是不要管它，你就继续缠线就行了。有一天你会灵光一现，在坚持了很长时间之后，最终会突然理解整个系统。

我能给你的建议就是你仔细看看前面几章内容。看文章的文字，看文章里的插图，看文章的代码。很自然的，你就会慢慢理解他。甚至我还建议你，自己从头写一遍代码。相信我，虽然这个过程很慢，但是最终你会有所收获。

当一切完成的时候，你会发现你会得到一个很光滑的毛球，即使它不那么光滑也无所谓，因为他，总是比你之前设想的，结果要好的多，起码你不会几天就忘了。

回归正题把，在这篇文章里，你会综合运用之前文章里提到的那些知识，写一个解析器和解释来处理算术表达式。这个表达式会是很复杂的，有加减乘除各种操作，你会写出来一个解释器，可以解析14+2×3-6÷2。

看代码之前，你需要知道两个概念，操作符的结合律和优先律.

你知道的， 7 + 3 + 1 等价于(7 + 3) + 1， 7 - 3 - 1 等价于 (7 - 3) - 1。这不奇怪。但是7 - 3 - 1 不等价于 7 - (3 - 1)。

加减乘数是满足左结合律的。

7 + 3 + 1 is equivalent to (7 + 3) + 1
7 - 3 - 1 is equivalent to (7 - 3) - 1
8 * 4 * 2 is equivalent to (8 * 4) * 2
8 / 4 / 2 is equivalent to (8 / 4) / 2

左结合律是什么意思呢？

7 + 3 + 1 中的3，左右两边都有加号。那么+会属于左边的操作数的。也就是说+是左结合的。因此 7 + 3 + 1 等价于 (7 + 3) + 1。

OK，那 7 + 5 * 2 这种有两种运算符的呢？7 + (5 * 2) 还是 (7 + 5) * 2？如何处理这种不确定性。

我们仅仅通过结合律是不能处理这种情况的，因为结合率只能处理同性质的操作符，比如说加减，或者是乘除。如果加减和乘除一起使用的话，我们就需要有一个新的规律来套用，我们叫优先律。乘除的优先级要高于加减。

这个挺熟悉的把，小学生都懂。

对于同样优先级的计算式，我们就使用结合律就可以了。

7 + 3 - 1 is equivalent to (7 + 3) - 1
8 / 4 * 2 is equivalent to (8 / 4) * 2

抽象出来的表格就是：

从表格中我们能得到语法：

针对每一个优先级，都定义一个非结束符。产生的语法内容要包括同优先级的算数操作码，和下一个高优先级的非结束符。
为表达式单位（整数）产生额外的非结束符因子。基本的原则是如果你有N层优先级，你就需要N+1个非结束符。N个给每层优先级使用，剩下的1个给表达式单位使用。

根据第一条规则，非结束符expr是第二级，非结束符term是第一级。

根据第二条规则，我们给整数定义一个非结束符。

结合起来就是一条我们自己定义的语法：

语法图就是：

换成嵌套的矩形就是：

跟上一章相比，代码的改动有两处：

The Lexer class can now tokenize +, -, *, and / (Nothing new here, we just combined code from previous articles into one class that supports all those tokens)
Recall that each rule (production), R, defined in the grammar, becomes a method with the same name, and references to that rule become a method call: R(). As a result the Interpreter class now has three methods that correspond to non-terminals in the grammar: expr, term, and factor.
Lexer词法分析器类可以接受所有操作符。
Interpreter类现在已经有了3个方法，对应了expr、term和factor这三个非结束符。

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, MUL, DIV, EOF = (
    'INTEGER', 'PLUS', 'MINUS', 'MUL', 'DIV', 'EOF'
)


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, MUL, DIV, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', '*', '/', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
            Token(MUL, '*')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Lexer(object):
    def __init__(self, text):
        # client string input, e.g. "3 * 5", "12 / 3 * 4", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception('Invalid character')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            if self.current_char == '*':
                self.advance()
                return Token(MUL, '*')

            if self.current_char == '/':
                self.advance()
                return Token(DIV, '/')

            self.error()

        return Token(EOF, None)


class Interpreter(object):
    def __init__(self, lexer):
        self.lexer = lexer
        # set current token to the first token taken from the input
        self.current_token = self.lexer.get_next_token()

    def error(self):
        raise Exception('Invalid syntax')

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.lexer.get_next_token()
        else:
            self.error()

    def factor(self):
        """factor : INTEGER"""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def term(self):
        """term : factor ((MUL | DIV) factor)*"""
        result = self.factor()

        while self.current_token.type in (MUL, DIV):
            token = self.current_token
            if token.type == MUL:
                self.eat(MUL)
                result = result * self.factor()
            elif token.type == DIV:
                self.eat(DIV)
                result = result / self.factor()

        return result

    def expr(self):
        """Arithmetic expression parser / interpreter.

        calc>  14 + 2 * 3 - 6 / 2
        17

        expr   : term ((PLUS | MINUS) term)*
        term   : factor ((MUL | DIV) factor)*
        factor : INTEGER
        """
        result = self.term()

        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        lexer = Lexer(text)
        interpreter = Interpreter(lexer)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

运行一下试试：

$ python calc5.py
calc> 3
3
calc> 2 + 7 * 4
30
calc> 7 - 8 / 4
5
calc> 14 + 2 * 3 - 6 / 2
17

Runtime Error

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录