1.编译原理

s.feng

已于 2022-05-27 19:31:39 修改

阅读量108

点赞数

分类专栏：编译原理文章标签： c++

于 2022-05-27 18:02:31 首次发布

本文链接：https://blog.csdn.net/feng__shuai/article/details/125008726

版权

编译原理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

背景

平时写代码，总感觉飘在空中，不知道自己的这行代码底层怎么和硬件打交道，此外软件层面很多框架都是开源的，有问题看看源码就行，也不算有特别的门槛，不过编译原理发现周围人了解的不多，所以打算学一下，这里通过记录博客的方式来督促自己进行学习。借用Ralph Waldo Emerson的一句话:

“If you learn only methods, you’ll be tied to your methods. But if you learn principles, you can devise your own methods.”

代码示例

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, or EOF
        self.type = type
        # token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3+5"
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None

    def error(self):
        raise Exception('Error parsing input')

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        text = self.text

        # is self.pos index past the end of the self.text ?
        # if so, then return EOF token because there is no more
        # input left to convert into tokens
        if self.pos > len(text) - 1:
            return Token(EOF, None)

        # get a character at the position self.pos and decide
        # what token to create based on the single character
        current_char = text[self.pos]

        # if the character is a digit then convert it to
        # integer, create an INTEGER token, increment self.pos
        # index to point to the next character after the digit,
        # and return the INTEGER token
        if current_char.isdigit():
            token = Token(INTEGER, int(current_char))
            self.pos += 1
            return token

        if current_char == '+':
            token = Token(PLUS, current_char)
            self.pos += 1
            return token

        self.error()

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def expr(self):
        """expr -> INTEGER PLUS INTEGER"""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        # we expect the current token to be a single-digit integer
        left = self.current_token
        self.eat(INTEGER)

        # we expect the current token to be a '+' token
        op = self.current_token
        self.eat(PLUS)

        # we expect the current token to be a single-digit integer
        right = self.current_token
        self.eat(INTEGER)
        # after the above call the self.current_token is set to
        # EOF token

        # at this point INTEGER PLUS INTEGER sequence of tokens
        # has been successfully found and the method can just
        # return the result of adding two integers, thus
        # effectively interpreting client input
        result = left.value + right.value
        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

运行上述代码：

calc> 1+2
5

整个流程：首先是获取这个字符串"1+2", 然后把字符串拆开三个token, token我觉得简单理解为一个个的对象就行，token是有类型和值的，这里1和2就是整数类型，'+'就是加法类型，这里还有一个结尾类型，不过这个token没用到。
名词解释：其中把这个字符串拆成token的过程叫做语法分析，在解释器中这一部分叫做语法分析器(lexical analyzer), 也有很多人叫他：scanner、tokenizer

从上面的流程可以知道，字符串被拆为：INTEGER -> PLUS -> INTEGER，接下来解释器要要去处理这个结构，这里用了expr这个函数，这个函数目的就是负责判断解析出来的结构是不是这个结构（INTEGER -> PLUS -> INTEGER），最后给出计算结果，到此位置，这个解释器成功的处理你给的一串字符串"1+2"。

s.feng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
1.编译原理

背景平时写代码，总感觉飘在空中，不知道自己的这行代码底层怎么和硬件打交道，此外软件层面很多框架都是开源的，有问题看看源码就行，也不算有特别的门槛，不过编译原理发现周围人了解的不多，所以打算学一下，这里通过记录博客的方式来督促自己进行学习。...
复制链接

扫一扫