大学编译实验--词法分析器（Java实现）

最新推荐文章于 2023-04-27 16:02:59 发布

小巫技术博客

最新推荐文章于 2023-04-27 16:02:59 发布

阅读量6.1k

点赞数 3

分类专栏：【大学课程之编译原理】

本文链接：https://blog.csdn.net/wwj_748/article/details/8315161

版权

【大学课程之编译原理】专栏收录该内容

1 篇文章 0 订阅

订阅专栏

SIMPLE语言定义

一、字符集定义

1． <字符集> → <字母>│<数字>│<单界符>

2． <字母> → A│B│…│Z│a│b│…│z

3． <数字> → 0│1│2│…│9

4． <单界符> → +│-│*│/│=│<│>│(│)│[│]│:│. │; │, │'

二、单词集定义

5．<单词集> → <保留字>│<双界符>│<标识符>│<常数>│<单界符>

6．<保留字> → and│array│begin│bool│call│case│char│constant│dim│do│else│end│false│for│if│input│integer│not│of│or│output│procedure│program│read│real│repeat│set│stop│then│to│true│until│var│while│write

7．<双界符> → <>│<=│>=│:= │/*│*/│..

8．<标识符> → <字母>│<标识符> <数字>│<标识符> <字母>

9．<常数> → <整数>│<布尔常数>│<字符常数>

10．<整数> → <数字>│<整数> <数字>

11．<布尔常数> → true│false

12．<字符常数> → ' 除{'} 外的任意字符串'

三、数据类型定义

13．<类型> → integer│bool│char

四、表达式定义

14．<表达式> → <算术表达式>│<布尔表达式>│<字符表达式>

15．<算术表达式> → <算术表达式> + <项>│<算术表达式> - <项>│<项>

16．<项> → <项> * <因子>│<项> / <因子>│<因子>

17．<因子> → <算术量>│- <因子>

18．<算术量> → <整数>│<标识符>│（ <算术表达式> ）

19．<布尔表达式> → <布尔表达式> or <布尔项>│<布尔项>

20．<布尔项> → <布尔项> and <布因子>│<布因子>

21．<布因子> → <布尔量>│not <布因子>

22．<布尔量> → <布尔常量>│<标识符>│（ <布尔表达式> ）│

<标识符> <关系符> <标识符>│<算术表达式> <关系符> <算术表达式>

23．<关系符> → <│<>│<=│>=│>│=

24．<字符表达式> → <字符常数>│<标识符>

五、语句定义

25．<语句> → <赋值句>│<if句>│<while句>│<repeat句>│<复合句>

26．<赋值句> → <标识符> := <算术表达式>

27．<if句>→if <布尔表达式> then <语句>│if <布尔表达式> then <语句> else <语句>

28．<while句> →while <布尔表达式> do <语句>

29．<repeat句> →repeat <语句> until <布尔表达式>

30．<复合句> → begin <语句表> end

31．<语句表> → <语句> ；<语句表>│<语句>

六、程序定义

32．<程序> → program <标识符> ；<变量说明> <复合语句> .

33．<变量说明> → var<变量定义>│ε

34．<变量定义> → <标识符表> ：<类型> ；<变量定义>│<标识符表> ：<类型> ；

35．<标识符表> → <标识符> ，<标识符表>│<标识符>

七、 SIMPLE语言单词编码

单词	种别码	单词	种别码	单词	种别码
and	1	output	21	*	41
array	2	procedure	22	*/	42
begin	3	program	23	+	43
bool	4	read	24	,	44
call	5	real	25	-	45
case	6	repeat	26	.	46
char	7	set	27	..	47
constant	8	stop	28	/	48
dim	9	then	29	/*	49
do	10	to	30	:	50
else	11	true	31	:=	51
end	12	until	32	;	52
false	13	var	33	<	53
for	14	while	34	<=	54
if	15	write	35	<>	55
input	16	标识符	36	=	56
integer	17	整数	37	>	57
not	18	字符常数	38	>=	58
of	19	(	39	[	59
or	20	)	40	]	60

八、实验一：设计SAMPLE语言的词法分析器

检查要求：

a) 启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）；

b) 请求输入测试程序名，键入程序名后自动开始词法分析并输出结果；

c) 输出结果为单词的二元式序列（样式见样板输出1和2）；

d) 要求能发现下列词法错误和指出错误性质和位置：

非法字符，即不是SAMPLE字符集的符号；

字符常数缺右边的单引号（字符常数要求左、右边用单引号界定，不能跨行）；

注释部分缺右边的界符*/（注释要求左右边分别用/*和*/界定，不能跨行）；

发现错误后要能够继续编译下去，不能只报一个错误；

九、实验一测试程序与样板输出

测试程序1：程序名TEST1

and array begin bool call

case char constant dim do

else end false for if

input integer not of or

output procedure program read real

repeat set stop then to

true until var while write

abc 123 'EFG' ( ) * + , - . .. /

: := ; < <= <> = > >= [ ]

样板输出1：（要求在屏幕上显示）注：（种别码，单词）

( 1 , and) (2 , array ) ( 3 , begin ) ( 4 ,bool) ( 5 , call )

( 6 , case) ( 7 , char) ( 8 , constant) ( 9 , dim) (10, do )

(11 , else) (12, end) (13 ,false) (14 ,for) (15 ,if)

(16 ,input) (17,integer) (18 ,not) (19 ,of) (20 ,or)

(21 , output) (22 ,procedure) (23 ,program) (24 ,read) (25,real)

(26 ,repeat) (27 ,set) (28 ,stop) (29 ,then) (30,to)

(31 ,true) (32,until) (33 ,var) (34,while) (35 ,write)

(36 ,abc) (37,123) (38 ,EFG) (39 , ( ) (40 , ) )

(41 , * ) (43, + ) (44, , ) (45, - ) (46, . )

(47 , .. ) (48, / ) (50, : ) (51, := ) (52, ; )

(53 , < ) (54 , <= ) (55 , <> ) (56, = ) (57, > )

(58 , >= ) (59 , [ ) (60, ])

测试程序2：程序名TEST2

program example2;

var A,B,C:integer;

X,Y:bool;

begin /* this is an example */

A:=B*C+37;

X:= 'ABC';

end.

样板输出2：（要求在屏幕上显示）

(23 , program) (36 , example2 ) (52 , ; ) (33, var ) (36 , A )

(44 , , ) (36, B ) (44 , , ) (36 , C) (50 , : )

(17 , integer ) (52 , ; ) (36 , X ) (44, , ) (36, Y )

(50 , : ) ( 4 , bool ) (52 , ; ) ( 3 , begin ) (36 , A )

(51 , :=) (36, B ) (41, * ) (36, C ) (43, + )

(37 , 37 ) (52 , ; ) (36 , X ) (51 , := ) (38 , ABC )

(52, ; ) (12, end ) (46 , . )

十、实验二：设计SAMPLE语言的语法、语义分析器，输出四元式的中间结果。

检查要求：

a) 启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）。

b) 请求输入测试程序名，键入程序名后自动开始编译。

c) 输出四元式中间代码（样式见样板输出3和4）。

d) 能发现程序的语法错误并输出出错信息。

十一、测试样板程序与样板输出

测试程序3：程序名TEST4 测试程序4：程序名TEST5

program example4; programexample5;

var A,B,C,D:integer; var A,B,C,D,W:integer;

begin begin

A:=1; B:=5; C:=3; D:=4; A:=5; B:=4; C:=3;D:=2; W:=1;

while A<C and B>D do if W>=1 thenA:=B*C+B/D

if A=1 thenC:=C+1 else else repeat A:=A+1 until A<0

while A<=D do A:=A*2 end.

end.

样板输出3：（要求在屏幕上显示）样板输出4：（要求在屏幕上显示）

( 0) (program,example4,-,-) (0) (program,example5,-,-)

( 1) (:= , 1 , - , A) (1) (:= , 5 , - , A)

( 2) (:= , 5 , - , B) (2) (:= , 4 , - , B)

( 3) (:= , 3 , - , C) (3) (:= , 3 , - , C)

( 4) (:= , 4 , - , D) (4) (:= , 2 , - , D)

( 5) (j< , A , C, 7) (5) (:= , 1 , - , W)

( 6) (j , - , - , 20) (6) (j>=, W , 1 , 8)

( 7) (j> , B , D, 9) (7) (j , - , - , 13)

( 8) (j , - , - , 20) (8) (* , B , C , T1)

( 9) (j= , A , 1 , 11) ( 9) (/ , B , D, T2)

(10) (j , - , - , 14) (10) (+ , T1, T2 , T3)

(11) (+ , C , 1 , T1) (11) (:= , T3 , - , A)

(12) (:= , T1 , - , C) (12) (j , - , - , 17)

(13) (j , - , - , 5) (13) (- , A , 1 , T4)

(14) (j<=, A , D, 16) (14) (:= , T4 , - , A)

(15) (j , - , - , 5) (15) (j< , A , 0 , 17)

(16) (* , A, 2 , T2) (16) (j , - , - , 13)

(17) (:= , T2 , - , A) (17) (sys , - , - , -)

(18) (j , - , - ,14)

(19) (j , - , - , 5)

(20) (sys , - , - , -)

实验1代码：

package firstExam;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;

public class Test3 {
	private static String dyhStr = "'";
	// 定义一个字符串数组用来保存保留字
	private static String[] keyWord = { "and", "array", "begin", "bool",
			"call", "case", "char", "constant", "dim", "do", "else", "end",
			"false", "for", "if", "input", "integer", "not", "of", "or",
			"output", "procedure", "program", "read", "real", "repeat", "set",
			"stop", "then", "to", "true", "until", "var", "while", "write" };

	private static char[] sigleDelimiter = { '+', '-', '*', '/', '=', '<', '>',
			'(', ')', '[', ']', ':', '.', ';', ',', dyhStr.charAt(0) };

	// 判断是否为保留字，每次读取的是字符串
	public static boolean isKeyWord(String str) {
		for (int i = 0; i < keyWord.length; i++) {
			if (keyWord[i].equals(str)) {
				return true;
			}
		}
		return false;
	}

	// 判断是否是数字，每次读取的是字符
	public static boolean isDigit(char ch) {
		if (ch >= 48 && ch <= 57) {
			return true;
		} else {
			return false;
		}
	}

	// 判断是否为字母，每次读取的是字符
	public static boolean isLetter(char ch) {
		if ((ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122) | (ch == 37)) {
			return true;
		} else {
			return false;
		}
	}

	// 判断是否为单界符，每次读取的是字符
	public static boolean isSingleDlimeter(char ch) {
		for (int i = 0; i < sigleDelimiter.length; i++) {
			if (ch == sigleDelimiter[i]) {
				return true;
			}
		}
		return false;
	}

	// 获取该保留字的种别码
	public static int getKeywordKindCode(String str) {
		int keyWordIndex = 0;
		for (int i = 0; i < keyWord.length; i++) {
			if (str.equals(keyWord[i])) {
				keyWordIndex = i + 1;
			}
		}
		return keyWordIndex;
	}

	// 获取单界符的种别码
	// '+','-','*', '/', '=', '<', '>', '(',')', '[', ']',':', '.', ';',','
	public static int getSingleKindCode(char ch) {
		int sCode = 0;
		switch (ch) {
		case '+':
			sCode = 43;
			break;
		case '-':
			sCode = 45;
			break;
		case '*':
			sCode = 41;
			break;
		case '/':
			sCode = 48;
			break;
		case '=':
			sCode = 56;
			break;
		case '<':
			sCode = 53;
			break;
		case '>':
			sCode = 57;
			break;
		case '(':
			sCode = 39;
			break;
		case ')':
			sCode = 40;
			break;
		case '[':
			sCode = 59;
			break;
		case ']':
			sCode = 60;
			break;
		case ':':
			sCode = 50;
			break;
		case '.':
			sCode = 46;
			break;
		case ';':
			sCode = 52;
			break;
		case ',':
			sCode = 44;
			break;
		}
		return sCode;
	}

	public static int getDoubleKindCode(String str) {
		int code = 0;
		// '=','>=','<=',':='
		if (str.equals(":=")) {
			code = 51;
		} else if (str.equals(">=")) {
			code = 58;
		} else if (str.equals("<=")) {
			code = 54;
		} else if (str.equals("..")) {
			code = 47;
		} else if (str.equals("<>")) {
			code = 55;
		}
		return code;
	}

	/**
	 * 从D:/file1.text读取文本
	 * 
	 * @param path
	 *            文本路径
	 * @return 返回读取的字符串
	 * @throws IOException
	 */
	public static String FileInputStreamMethod(String path) throws IOException {
		File file = new File(path);
		if (!file.exists() || file.isDirectory()) {
			throw new FileNotFoundException();
		}
		FileInputStream fis = new FileInputStream(file);
		byte[] buffer = new byte[1024];
		StringBuffer sb = new StringBuffer();
		while ((fis.read(buffer)) != -1) {
			sb.append(new String(buffer));
			buffer = new byte[1024];
		}
		return sb.toString();
	}

	/**
	 * 词法分析核心函数
	 */
	public static void tokenAnysis() {
		String lineStr;
		int row = 0;

		String filePath = "F://file2.txt";
		try {
			String fileTxt = FileInputStreamMethod(filePath).trim();
			System.out.println("源程序如下: ");
			System.out.println(fileTxt);
			System.out.println("开始词法分析");
		} catch (IOException e1) {
			// TODO Auto-generated catch block
			e1.printStackTrace();
		}

		File file = new File(filePath);
		BufferedReader br;
		char ch; // 单个字符
		int count = 0; // 用来统计二元组个数
		try {
			br = new BufferedReader(new FileReader(file));
			// 一行一行地分析
			while ((lineStr = br.readLine()) != null) {
				int i = 0;
				row++; // 行号+1;
				int col = 1; // 列号
				while (i <= lineStr.length() - 1) {
					ch = lineStr.charAt(i);
					// 判断读取第一个字符是否为字母
					if (isLetter(ch)) {
						StringBuffer sb = new StringBuffer();
						sb.append(ch);
						col++; // 列号+1
						// 读取下一个字符
						ch = lineStr.charAt(++i);
						// 是字符或数字都ok
						while ((isLetter(ch) || isDigit(ch))) {
							sb.append(ch);
							if (i == lineStr.length() - 1) {
								i++;
								break;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
							}
							// 列号继续加1
							col++;
						}

						// 如果是关键字
						if (isKeyWord(sb.toString())) {
							// 获取该关键字的种别码
							int kindCode = getKeywordKindCode(sb.toString());
							// 输出该关键字的二元组
							System.out.print("(" + kindCode + ","
									+ sb.toString() + ")" + "	");
							// 二元组个数+1
							count++;
						} else { // 要么为标识符
							// 输出该标识符的二元组
							System.out.print("(" + 36 + "," + sb.toString()
									+ ")" + " ");
							count++;
						}
						if (count % 5 == 0) {
							System.out.println();
						}
						// 如果是单界符的话
					} else if (isSingleDlimeter(ch)) {
						StringBuffer sb = new StringBuffer();
						String dyh = "'";
						// 如果是逗号(,)或者是分号(;)等于号(=)的话，直接输出二元组
						if ((ch == ',') || (ch == ';') || (ch == '=')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;
							// 如果是左括号'(',右括号')',左中括号'[',右中括号']',直接输出而元组
						} else if ((ch == '(') || (ch == ')') || (ch == '[')
								|| (ch == ']')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;

						}
						// 如果读取的字符是加(+),减(-),乘(*)的话，也直接输出该单词的二元组
						else if ((ch == '+') || (ch == '-') || (ch == '*')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;
							// 如果读取的字符是等号(=),大于号(>),小于号(<)或者是冒号(:)
							// 这时需要继续读取下一个字符进行判断是否是双界符
						} else if ((ch == '>') || (ch == '<') || (ch == ':')) {
							// 定义一字符来存放上一个字符
							char ch1 = ch;
							sb.append(ch);
							col++;
							// 读取下一个字符
							ch = lineStr.charAt(++i);
							// 如果下一个字符为等于号(=)
							if (ch == '=') {
								sb.append(ch);
								col++;
								// 这时候可以直接输出双界符的相关的二元组
								System.out.print("("
										+ getDoubleKindCode(sb.toString())
										+ "," + sb.toString() + ")" + " ");
								i++;
								count++;
								// 如果上一个字符是小于号(<)的话
							} else if (ch1 == '<') {
								// 如果下一个字符是大于号(>)的话
								if (ch == '>') {
									sb.append(ch);
									col++;
									// 这时会匹配为SIMPLE语言的不等于号(<>)
									// 输出二元组
									System.out.print("("
											+ getDoubleKindCode(sb.toString())
											+ "," + sb.toString() + ")" + " ");
									count++;
								}
								// 如果下一个字符不是与上一个字符匹配为双界符，就直接输出该单界符

							} else {
								System.out.print("("
										+ getSingleKindCode(sb.charAt(0)) + ","
										+ sb.charAt(0) + ")" + " ");
								count++;
								// 并且跳出当前循环
								continue;
							}

						}
						// 如果读取的字符为斜线(/)或者是单引号('),双引号(")
						else if ((ch == '/') || (ch == dyh.charAt(0))) {
							sb.append(ch);
							col++;
							if (i == lineStr.length() - 1) {
								i++;
								break;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
							}
							if (ch == '*') {
								sb.append(ch);
								int bb = 0;
								bb++;
								ch = lineStr.charAt(++i);

								col++;
								while (ch != '*') {
									if (i == lineStr.length() - 1) {
										i++;
										System.out.print("错误类型:注释不匹配" + " 第 " + row
												+ " 行，第" + col + " 列 ");
										break;
									} else {
										ch = lineStr.charAt(++i);
										col++;
									}
								}
								if(i <= lineStr.length()){
									break;
								} else {
									ch = lineStr.charAt(++i);
								}
									col++;
								if (ch == '/') {
									bb--;
									i++;
									continue;
								} else {
									System.out.print("错误类型:注释不匹配" + " 第 " + row
											+ " 行，第" + col + " 列 ");
								}
							}
							if (sb.charAt(0) == dyh.charAt(0)) {
								StringBuffer sb1 = new StringBuffer();
								sb1.append(ch);
								col++;
								if (i == lineStr.length() - 1) {
									i++;
									break;
								} else {
									ch = lineStr.charAt(++i);
									col++;
									while (ch != dyh.charAt(0)) {
										sb1.append(ch);

										if (i == lineStr.length() - 1) {
											i++;
											break;
										} else {
											// 继续读取字符
											ch = lineStr.charAt(++i);
											col++;
										}
									}
								}
								if( ch == dyh.charAt(0)){
									// 输出的是字符常数
									System.out.print("(" + 38 + ","
											+ sb1.toString() + ")" + " ");
									count++;
								}
								else {
									System.out.print("错误类型:单引号不匹配" + " 第 " + row + " 行 , 第 " + col + " 列");
								}
								i++;
							}

						} else if (ch == '.') {
							sb.append(ch);
							col++;
							StringBuffer sb1 = new StringBuffer();
							if (i == lineStr.length() - 1) {
								i++;
								System.out.print("(" + getSingleKindCode(ch)
										+ "," + ch + ")" + " ");
								count++;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
								if (ch == '.') {
									sb.append(ch);
									// 这时候可以直接输出双界符(..)的相关的二元组
									System.out.print("("
											+ getDoubleKindCode(sb.toString())
											+ "," + sb.toString() + ")" + " ");
									i++;
									col++;
									count++;
								} else {
									System.out.print("("
											+ getSingleKindCode(sb.charAt(0))
											+ "," + sb.charAt(0) + ")" + " ");
									i++;
									col++;
								}
							}

						}
						if (count % 5 == 0) {
							System.out.println();
						}
					}
					// 如果第一次读入的是数字
					else if (isDigit(ch)) {
						StringBuffer sb = new StringBuffer();
						sb.append(ch);
						col++;
						ch = lineStr.charAt(++i);
						if (isDigit(ch)) {
							while (isDigit(ch)) {
								sb.append(ch);
								col++;
								ch = lineStr.charAt(++i);
							}
							System.out.print("(" + 37 + "," + sb.toString()
									+ ")" + " ");
							count++;
						}
						if (isLetter(ch)) {
							while (isLetter(ch)) {
								sb.append(ch);
								col++;
								ch = lineStr.charAt(++i);
							}
							System.out.print("非法字符" + sb.toString() + " 第 "
									+ row + " 行,第 " + col + " 列出错");
						}

						if (count % 5 == 0) {
							System.out.println();
						}
					} else {
						i++;
						col++;
					}

				}

			}

		} catch (Exception e) {
			// TODO: handle exception
			e.printStackTrace();
		}
	}

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		Scanner input = new Scanner(System.in);
		String testName;
		System.out.println("&&&欢迎来到小巫的编译世界&&&:");
		System.out.println("&姓名:" + "巫文杰" + "\n" + "&班级:" + "10计算机科学与技术1班"
				+ "\n" + "&学号:" + "201038889071");
		System.out.println("请输入程序测试名:");
		testName = input.nextLine();
		if (testName.equals("Test3")) {
			tokenAnysis();
		}
	}

}

小巫技术博客

关注

3
点赞
踩
15

收藏

觉得还不错? 一键收藏
打赏
5
评论
大学编译实验--词法分析器（Java实现）

SIMPLE语言定义一、字符集定义1．字符集>→ 字母>│数字>│单界符>2．字母>→ A│B│…│Z│a│b│…│z3．数字>→0│1│2│…│94．单界符>→+│-│*│/│=││>│(│)│[│]│:│.│;│,│' 二、单词集定义
复制链接

扫一扫