（1）词法分析器

最新推荐文章于 2024-06-01 18:28:10 发布

黯止依蓝

最新推荐文章于 2024-06-01 18:28:10 发布

阅读量935

点赞数 23

文章标签： c++

本文链接：https://blog.csdn.net/Kongxiangyunltj/article/details/134884211

版权

简单来说就是写一个函数，这个函数的输入是文字流，我们要把这个文字流的标签搞出来

比如某种语言标签如下

enum Token {
  tok_eof = -1,

  // commands
  tok_def = -2,
  tok_extern = -3,

  // primary
  tok_identifier = -4,
  tok_number = -5,
};

然后词法分析器相当于一个函数，排除空格，检查扫描的文字，然后给出标签

static std::string IdentifierStr; // Filled in if tok_identifier

static double NumVal; // Filled in if tok_number

/// gettok - Return the next token from standard input.

static int gettok() {

static int LastChar = ' ';

// Skip any whitespace.

while (isspace(LastChar))

LastChar = getchar();

if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*

IdentifierStr = LastChar;

while (isalnum((LastChar = getchar())))

IdentifierStr += LastChar;

if (IdentifierStr == "def")

return tok_def;

if (IdentifierStr == "extern")

return tok_extern;

return tok_identifier;

}

if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+

std::string NumStr;

do {

NumStr += LastChar;

LastChar = getchar();

} while (isdigit(LastChar) || LastChar == '.');

NumVal = strtod(NumStr.c_str(), nullptr);

return tok_number;

}

//跳注释

if (LastChar == '#') {

// Comment until end of line.

LastChar = getchar();

while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');

if (LastChar != EOF)

return gettok();

}

// Check for end of file. Don't eat the EOF.

if (LastChar == EOF)

return tok_eof;

// Otherwise, just return the character as its ascii value.

int ThisChar = LastChar;

LastChar = getchar();

return ThisChar;

}

这段代码是一个词法分析器的实现，用于将输入分解成标记（tokens）。它定义了一个全局变量IdentifierStr用于保存标识符的名称，NumVal用于保存数值字面量的值。

gettok函数从标准输入中读取下一个字符，并根据字符的类型返回相应的标记。它首先跳过任何空格字符，然后根据字符的类型进行不同的处理。

如果字符是字母，则识别为标识符。gettok会读取连续的字母和数字字符，将它们保存在IdentifierStr中，并根据IdentifierStr的值判断是否是关键字（如"def"和"extern"）或是普通的标识符。

如果字符是数字或小数点，则识别为数值字面量。gettok会读取连续的数字和小数点字符，将它们保存在NumStr中，并使用strtod函数将其转换为双精度浮点数，保存在NumVal中。

如果字符是井号（#），则识别为注释。gettok会跳过注释直到行尾，并递归调用gettok函数获取下一个标记。

如果字符是文件结尾（EOF），则返回标记tok_eof表示已到达文件结尾。

否则，将字符作为其ASCII值返回。

总的来说，这段代码实现了一个简单的词法分析器，可以将输入分解成标记，并根据标记的类型进行相应的处理。

源码地址1. Kaleidoscope: Kaleidoscope Introduction and the Lexer — LLVM 18.0.0git documentation

黯止依蓝

关注

23
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
（1）词法分析器

gettok会读取连续的字母和数字字符，将它们保存在IdentifierStr中，并根据IdentifierStr的值判断是否是关键字（如"def"和"extern"）或是普通的标识符。gettok会读取连续的数字和小数点字符，将它们保存在NumStr中，并使用strtod函数将其转换为双精度浮点数，保存在NumVal中。gettok函数从标准输入中读取下一个字符，并根据字符的类型返回相应的标记。总的来说，这段代码实现了一个简单的词法分析器，可以将输入分解成标记，并根据标记的类型进行相应的处理。
复制链接

扫一扫