如何写一个简单的解释器（Interpreter）-3

最新推荐文章于 2022-01-13 10:20:51 发布

Runtime Error

最新推荐文章于 2022-01-13 10:20:51 发布

阅读量322

点赞数

分类专栏： compiler interpreter

本文链接：https://blog.csdn.net/weixin_38151747/article/details/85762246

版权

compiler 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

interpreter

15 篇文章 1 订阅

订阅专栏

不管看多少游泳的书，跟多少游泳教练交谈，第一次游泳的时候总会沉下去。所以一定要多练习。

前两章我们聊了怎么实现两个整数的加和减，比如 “7 + 3” 和 “12 - 9”。今天我们聊聊怎么实现任意多位数的加减，比如 “7 - 3 + 2 - 1”.

基本上，下面的图就是这种表达式的语法图：

什么是语法图呢？语法图（syntax diagram） 是用图形表示的一种编程语言的语法规则。基本上，一个语法图展现的是那些句子是合法的，那些是不合法的。

语法图很容易读，只需要沿着箭头指的路径读就可以了。有些路径表示的是决策，有些路径表示的是循环往复。

你可以这么读上面这个图：第一个短语后面随机跟着一个加或者减，然后跟着第二个短语。另一个短语可能后面又重复跟着加减和第三个短语。依次类推。你可能会追问”短语“是什么意思，好吧，本文中指的是一个整数数字。

语法图有什么作用呢？除了帮你图形化地去理解编程语言的规范（语法），还可以帮你去写你自己的解析器。

关于后者，这里要详细说说。

从语法图上看，下面的表达式都应该是合法的：

3
3 + 4
7 - 3 + 2 - 1

“3 + ” 就不是一个合法的算术表达式，因为根据语法图，加号后面必须跟着一个短语，也就是整数，否则就是一个语法错误。下面的代码片段展示了语法图的解析过程。图中的长方形变成了一个方法，叫term，它用来解析一个整数。expr方法用来遍历语法图。

def term(self):
    self.eat(INTEGER)

def expr(self):
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            self.term()


def term(self):
    """Return an INTEGER token value"""
    token = self.current_token
    self.eat(INTEGER)
    return token.value

def expr(self):
    """Parser / Interpreter """
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    result = self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            result = result + self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            result = result - self.term()

    return result

从代码中看的很清楚，expr首先调用了term方法，接着就是一个while循环。循环内部解析器根据不同的token来决定是做加法，还是做减法。不过解释器本身并没有interpret什么，当他识别出来一个表达式的时候，他就忽略掉；如果没有识别出来，他就报告异常错误。

所以下面我们在expr方法中，加一些interpreter的代码。

interpreter需要评估term返回给它的表达式，代码如下。这份代码支持了任意多数字的加减操作。

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5 + 3", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    ##########################################################
    # Lexer code                                             #
    ##########################################################
    def error(self):
        raise Exception('Invalid syntax')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    ##########################################################
    # Parser / Interpreter code                              #
    ##########################################################
    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def term(self):
        """Return an INTEGER token value."""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter."""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        result = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

运行一下试试：

$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
  File "calc3.py", line 147, in <module>
    main()
  File "calc3.py", line 142, in main
    result = interpreter.expr()
  File "calc3.py", line 123, in expr
    result = result + self.term()
  File "calc3.py", line 110, in term
    self.eat(INTEGER)
  File "calc3.py", line 105, in eat
    self.error()
  File "calc3.py", line 45, in error
    raise Exception('Invalid syntax')
Exception: Invalid syntax

在多思考一下，动手画一下写一写。

画一个语法图，只包含乘除法，比如“7 * 4 / 2 * 3”。说真的呢，用笔画一画。
改一下interpreter代码，让它只包含乘除。
从头写一遍interpreter。脑子中想着这几个概念：词法分析器把输入字符串转换为token序列，解析器拿到token序列后识别内部的结构，interpreter在解析器成功解析后得到运算结果。

Runtime Error

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何写一个简单的解释器（Interpreter）-3

不管看多少游泳的书，跟多少游泳教练交谈，第一次游泳的时候总会沉下去。所以一定要多练习。前两章我们聊了怎么实现两个整数的加和减，比如 “7 + 3” 和 “12 - 9”。今天我们聊聊怎么实现任意多位数的加减，比如 “7 - 3 + 2 - 1”.基本上，下面的图就是这种表达式的语法图：什么是语法图呢？语法图（syntax diagram）是用图形表示的一种编程语言的语法规则。基...
复制链接

扫一扫