山东大学编译原理实验（新）

最新推荐文章于 2024-10-14 17:54:02 发布

咕噜咕噜咕噜128

最新推荐文章于 2024-10-14 17:54:02 发布

阅读量736

点赞数 17

文章标签： c++

本文链接：https://blog.csdn.net/qq_62819443/article/details/140772926

版权

你需要完成一个简单的C++风格的编译器。这个实验分为若干部分。在这个部分中，你需要完成该编译器的词法分析器。

你的编译器不应当对用户输入程序进行假设（例如，假设用户输入的程序不超过若干字节，或每行不超过若干字符，或程序不超过若干符号）。

给定一个C++语言源程序，你需要将其从字符流转换为词语流。具体来说，你需要过滤源程序中的空白符(空格，注释、tab，回车、换行等)，识别关键字、标识符、数字以及运算符。

输入为一段C++语言风格的源程序，输出每个单词所对应的类型。

若程序没有词法错误，输出为单词流以及每个单词的类型，每个单词以及类型占一行。

若程序有词法错误，直接输出相应的错误。一段源代码内同时有多个词法错误的，只输出最早出现的那个错误类型。

输出的最后应该有换行。

错误有以下四种：

错误描述	输出内容
浮点数中不止一个小数点	Malformed number: More than one decimal point in a floating point number.
小数点在浮点数的开始或者结尾	Malformed number: Decimal point at the beginning or end of a floating point number.
整数或小数的整数部分中有前导零	Malformed number: Leading zeros in an integer.
非注释部分有不能识别的字符,包括单独出现一次的&,\| 字符	Unrecognizable characters.

前三种数字错误类型的输出优先级为1》2》3

例如如果出现 1.11. 这种情况，按照第一种错误类型报错。

关键字表如下

关键字	对应类型
int	INTSYM
double	DOUBLESYM
scanf	SCANFSYM
printf	PRINTFSYM
if	IFSYM
then	THENSYM
while	WHILESYM
do	DOSYM

符号表如下

符号	对应类型
=	AO
==	RO
>	RO
>=	RO
<	RO
<=	RO
\|\|	LO
&&	LO
!	LO
!=	RO
+	PLUS
-	MINUS
*	TIMES
/	DIVISION
,	COMMA
(	BRACE
)	BRACE
{	BRACE
}	BRACE
;	SEMICOLON

标识符以及数字

标识符应符合如下文法：

<标识符> → <字母>{<字母>|<数字>}

{}：出现0次或多次

标识符或者数字	对应类型
标识符(如x)	IDENT
整数（如3）	INT
小数(如2.1)	DOUBLE

对于注释，与C++类似，有两种方式，如下表所示：

//	单行注释
/* */	多行注释

假如只有/* 而没有匹配的 */，则认为从 /*往后的内容都是注释。

在词法分析中，注释的内容不应输出，即需要将这些内容过滤掉。

示例1

输入：

int a, b;

double c=1.2; // This is a comment

scanf(a);

scanf(b);

printf©;

输出：

int INTSYM

a IDENT

, COMMA

b IDENT

; SEMICOLON

double DOUBLESYM

c IDENT

= AO

1.2 DOUBLE

; SEMICOLON

scanf SCANFSYM

( BRACE

a IDENT

) BRACE

; SEMICOLON

scanf SCANFSYM

( BRACE

b IDENT

) BRACE

; SEMICOLON

printf PRINTFSYM

( BRACE

c IDENT

) BRACE

; SEMICOLON

示例2：

输入：

int a = 6037210

int b = 06356417

输出：

Malformed number: Leading zeros in an integer.

示例3：

输入：

12910
1223.219
27912.120921
2181123.
2810.
12123

输出：

Malformed number: Decimal point at the beginning or end of a floating point number.

示例4：

输入：

int a,b;
c=2;
d=123.21;
?
*
~

输出：

Unrecognizable characters.

示例5：

输入：

12738
0.2919
.0199
1210
1.111
1.201
10291.1290

输出：

Malformed number: Decimal point at the beginning or end of a floating point number.

示例6：

输入：

1234
2222222
1928301
1.87273
0.9218
3.12919
1.0291
1.1.112182
21211

输出：

Malformed number: More than one decimal point in a floating point number.

/*
    wirte by @lyz
    2024.4.26
    编译原理实验1：词法分析器
*/
#include <bits/stdc++.h>

#define LOCAL 0

std::map<std::string, std::string> mp{
    {"int", "INTSYM"},       {"double", "DOUBLESYM"}, {"scanf", "SCANFSYM"},
    {"printf", "PRINTFSYM"}, {"if", "IFSYM"},         {"then", "THENSYM"},
    {"while", "WHILESYM"},   {"do", "DOSYM"}};
std::map<std::string, std::string> mp1{
    {"=", "AO"},    {"==", "RO"},      {">", "RO"},    {">=", "RO"},
    {"<", "RO"},    {"<=", "RO"},      {"||", "LO"},   {"&&", "LO"},
    {"!", "LO"},    {"!=", "RO"},      {"+", "PLUS"},  {"-", "MINUS"},
    {"*", "TIMES"}, {"/", "DIVISION"}, {",", "COMMA"}, {"(", "BRACE"},
    {")", "BRACE"}, {"{", "BRACE"},    {"}", "BRACE"}, {";", "SEMICOLON"}};

std::vector<std::string> lexer(std::string str) {
    std::vector<std::string> ans;
    std::string s;

    // 去注释
    while (str.find("/*") != -1) {
        int pos = str.find("/*");
        int pos1 = str.find("*/");
        if (pos1 == -1) {
            str.erase(pos);
        }
        str.erase(pos, pos1 - pos + 2);
    }
    while (str.find("//") != -1) {
        int pos = str.find("//");
        int pos1 = str.find("\n", pos);
        str.erase(pos, pos1 - pos + 1);
    }

    // 加空格
    int n = str.size();
    for (int i = 0; i < n; i++) {
        if (i + 2 <= n && mp1.count(str.substr(i, 2))) {
            s.push_back(' ');
            s.push_back(str[i]);
            s.push_back(str[i + 1]);
            s.push_back(' ');
            i += 1;
        } else if (i + 1 <= n && mp1.count(str.substr(i, 1))) {
            s.push_back(' ');
            s.push_back(str[i]);
            s.push_back(' ');
        } else {
            s.push_back(str[i]);
        }
    }
    str = s;
    n = str.size();
    // 分词//
    std::string token;
    for (int i = 0; i < n; i++) {
        if (str[i] == ' ' || str[i] == '\n' || str[i] == '\t') {
            if (!token.empty()) {
                if (mp.count(token)) {
                    ans.push_back(token + ' ' + mp[token] + '\n');
                } else {
                    int idx = 0;
                    bool alp = 0;
                    bool dig = 0;
                    std::string h;
                    ///
                    while (idx < token.size()) {
                        while (isalpha(token[idx])) {
                            h += token[idx++];
                            alp = true;
                        }
                        while (isdigit(token[idx]) && alp) {
                            h += token[idx++];
                        }
                        if (!h.empty()) {
                            ans.push_back(h + " IDENT\n");
                        }
                        h.clear();
                        /
                        while (isdigit(token[idx]) || token[idx] == '.') {
                            h += token[idx++];
                            dig = 1;
                        }
                        if (dig && std::count(h.begin(), h.end(), '.') >= 2) {
                            std::cout << "Malformed number: More than one decimal point in a "
                                      "floating point number.\n";
                            return {};
                        }
                        if (dig && (h[0] == '.' || h.back() == '.')) {
                            std::cout << "Malformed number: Decimal point at the beginning "
                                      "or end of a floating point number.\n";
                            return {};
                        }
                        if (dig && h[0] == '0' && h.size() > 1) {
                            bool isok = 1;
                            if (h.find('.') != -1) {
                                for (int j = 0; j < h.size(); j++) {
                                    if (h[j] == '.') {
                                        break;
                                    }
                                    if (h[j] != '0') {
                                        isok = 0;
                                        break;
                                    }
                                }
                            } else {
                                isok = 0;
                            }
                            if (!isok) {
                                std::cout << "Malformed number: Leading zeros in an integer.\n";
                                return{};
                            }
                        }
                        if (dig && std::count(h.begin(), h.end(), '.') == 0) {
                            ans.push_back(h + ' ' + "INT\n");
                        } else if (dig) {
                            ans.push_back(h + " DOUBLE\n");
                        }
                        /
                        if (idx + 2 <= token.size() && mp1.count(token.substr(idx, 2))) {
                            ans.push_back(token.substr(idx, 2) + " " +
                                          mp1[token.substr(idx, 2)] + "\n");
                            idx += 2;
                        } else if (mp1.count(token.substr(idx, 1))) {
                            ans.push_back(token.substr(idx, 1) + " " +
                                          mp1[token.substr(idx, 1)] + "\n");
                            idx += 1;
                        }

                        if (idx < token.size() &&
                                (!isdigit(token[idx]) && !isalpha(token[idx])) &&
                                (dig == 0 && alp == 0)) {
                            std::cout << "Unrecognizable characters.\n";
                            return {};
                        }
                    }
                }
            }

            token = "";
        } else {
            token += str[i];
        }
    }

    return ans;
}

void solve() {
    std::string s, str;
    std::string last;

    // 输入流 
    std::ifstream input("C:\\Users\\LYZ\\Desktop\\untitled1\\in2.txt");
    std::ofstream fout("C:\\Users\\LYZ\\Desktop\\untitled1\\out.txt");
    if (LOCAL) {
        if (input.is_open()) {
            while (getline(input, s)) {
                str += s;
                str += '\n';
            }
            input.close();
        } else {
            std::cout << "Failed to open input file." << std::endl;
            return;
        }
    } else
        while (getline(std::cin, s)) {
            str += s;
            str += '\n';
        }

    auto ans = lexer(str);
    if (ans.empty()) return;
    // 输出流
    if (LOCAL)
        for (auto s : ans) {
            fout << s;
        }
    else
        for (auto s : ans) {
            std::cout << s;
        }
}


int main() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);
    std::cout.tie(nullptr);

    int t = 1;

    while (t--) {
        solve();
    }
    return 0;
}