Compilation principle experiment 1 —— Lexical Analysis

最新推荐文章于 2024-09-08 13:15:25 发布

FreanJa

最新推荐文章于 2024-09-08 13:15:25 发布

阅读量150

点赞数 2

分类专栏：编译原理文章标签：正则表达式

本文链接：https://blog.csdn.net/qq_45769399/article/details/125744135

版权

编译原理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文档详细记录了一次使用flex进行词法分析实验的过程。实验目标是设计并实现一个能处理算术表达式的词法分析器，识别整数、小数、变量和运算符。通过构建状态转换图，编写lex代码，最终成功分析了包含整数、小数、运算符和变量的算术表达式。实验中遇到的问题是负数识别，未能在不干扰减号识别的情况下解决。实验总结了flex工具的优缺点，强调了Linux基础和独立解决问题能力的重要性。

摘要由CSDN通过智能技术生成

编译原理实验报告

19计科全英 2019329621182

实验一词法分析

一、实验目的

根据 算术表达式的需求 ，设计并实现适用于表达式的 词法分析器，读入源程序(表达式)，根据 构词规则 拆分出单词。

确定其 状态转换图 ，用状态转换图的方法实现词法分析。
用 自动生成器 实现词法分析程序(可以是 lex,但不限于 lex)。
注意程序结构的设计思想, 将整个系统模块化，并考虑接口。

Experimental Requirement

Build the lexical parser (a scanner) of arithmetic expression, using the automation tool of third party. The experimental result Scanner can receive the sample expression, and output is a sequence of tokes that are defined by regular expression.

Lexical analysis is realized by the method of state transition diagram. Prepare the automata for Arithmetic Expression you want to implement in experiment.
Using automatic generator to implement lexical analysis program (may be lex, but not limited to lex)
Pay attention to the design of program architecture and interface of module of the system.

二、能力考核

调查研究(investigation)的能力:能够针对实验要求，设计一个 完整的实验 (20 分)。要求能够体现实验设计的科学性和完整性，分析解释实验数据(20 分)并能给出合理的结论。 (30 分)
三方工具的使用能力。总结使用三方工具的优缺点。指出使用三方工具需要具备哪些能力。并且评价自已在本次实验中的表现。(30 分)

Evaluation:

Investigation ability: Conduct investigations of complex problems using research-based knowledge and research methods including design of experiments, (20points)analysis and interpretation of data(20 points), and synthesis of information to provide valid conclusions. (30 points)
Ability to use third-party tools. Summarize the advantages and disadvantages of using third-party tools. Indicate the capabilities required to use third-party tools. And evaluate your performance in this experiment. (30 points)

三、实验过程

1. 根据题意，需要编写针对算术表达式的词法分析器，读入表达式可以实现单词的拆分

2. 首先要思考算术表达式的定义，我将其分为了以下四部分：

整数
小数
变量
符号

3. 针对这四个部分，分别写出其正则表达式：

整数 => 0|[1-9] [0-9]*
小数 => 0|[1-9] [0-9]*.+[0-9]+
变量 => [a-zA-Z] [a-zA-Z0-9_]*
操作符 => +|-|*|/|\|^|(|)|=

4. 这样的re表达式可以实现正整数，正小数，变量（包含大小写字母，下划线，但不允许数字开头），操作符（四则运算，幂，括号，等于号）的识别

5. 根据re绘制出状态转换图

digit = [0-9]

letter = a-z| A-Z

operator = + | - |***** | / |\ |^ | = | ( | )

underline = _

dot = .

图中B状态为整数，C状态为小数，D状态为变量，E状态为运算符

6. 进行lex代码的编写

%{
/* a Lex program that adds line numbers
to lines of text, printing the new text
to the standard output */
#include <stdio.h> 
int varNum = 0;
int numNum = 0;
int operatorNum = 0;
int floatNum = 0;
char variables[50] = "variables:\t";
char integer[50] = "integer:\t";
char floatType[50] = "float:\t";
char operator[50] = "operator:\t";
%}

num 0|[1-9][0-9]*
varname [a-zA-Z][a-zA-Z0-9_]*
float 0|[1-9][0-9]*"."+[0-9]+
operator ["+""\-""\*""\\""=""/""^""("")"]
  
%%
{varname} { 
  varNum++; 
  strcat(variables,yytext);
  strcat(variables,"\t");
}

{num}  { 
  numNum++; 
  strcat(integer,yytext);
  strcat(integer,"\t");
}

{float} { 
  floatNum++; 
  strcat(floatType,yytext);
  strcat(floatType,"\t");
}

{operator} { 
  operatorNum++; 
  strcat(operator,yytext);
  strcat(operator,"\t");
} 

%%
  
int main(void){ 
  yylex(); 
  printf("varNum = %d\nnumNum = %d\noperatorNum = %d\nfloatNum = %d\n", varNum,numNum,operatorNum,floatNum);
  printf("%s\n%s\n%s\n%s\n",variables,integer,floatType,operator);
  return 0; 
}

7. 实验数据

Dasf ^ 124 + ( vaRsd - 23.56 ) * _af / vdv_23 = masdm

算术表达式包含了整数（2），小数（2），程序设计中涉及到的运算符（10），以及多种形式的变量（5）

8. 运行结果

运行结果两部分

一部分包括各个 token 在输入的表达式中出现的个数
第二部分包括各种 token 的识别结果

可以看到运行结果和预期的一样，识别到了5个变量，2个整数，10个运算符，2个小数

9. 问题

最初时，打算在整数与小数中涵盖对负数的识别，但是在实际操作中，正则部分不知道如何定义才可以在不影响到减号识别对前提下，提取出负数来。

10. 实验工具和总结

实验工具
- 实验中我使用了MacOS终端中自带的 flex 进行
- 系统中对 flex 的介绍是 Generates programs that perform pattern-matching on text.
- flex 的版本是： flex 2.6.4 Apple(flex-34)
- 使用方法：
  - flex program.l
  - cc lex.yy.c -ll
  - ./a.out < input.txt
- 优缺点：
  - 优点： 系统自带，使用方便，不需要进行前置安装，编译报错也十分人性化，会具体标注出问题代码位置，方便debug
  - 缺点： 因为是命令行工具，所以需要一定的Linux基础，同时在命令行内直接使用vim编写较长的代码比较费力，同时排版困难，所以在代码编写的过程中还需要使用到其他的ide辅助编写
- 需要的能力：
  - 一定的Linux使用基础
  - 查询资料，独立解决问题的能力
自我评价
- 在实验前期，进行了一定的预习工作，提前查询了在MacOS（Unix）环境下使用Lex工具的方式，并且根据预习报告进行了动手尝试
- 实验中，对课本上给出的例子，尝试了多种输入，和微调token，加深理解
- 但在最后的实验中，没有能解决遇到的所有问题（无法识别负数）