词法分析器(不讲武德java版）_java词法分析器程序流程图-CSDN博客

本文链接：https://blog.csdn.net/weixin_43894577/article/details/110148491

一、实验目的

设计、编制并调试一个词法分析程序，加深对词法分析原理的理解。

二、使用仪器、器材

计算机一台

操作系统：Windows10

编程软件：Intellij IDEA

三、实验内容及原理

1、待分析的C语言子集的词法

1）关键字

main if else int return void while （都是小写）

2）专用符号

= + — * / < <= < >= = = != ；：，{ } [ ] ( )

3）其他标记

STRING::=” [^”]* “

ID::=letter(letter|digit)*

INT::=digit digit*

letter::= a|…|z|A|…|Z

digit::= 0|…|9

4）空格由空白、制表符和换行符组成

空格一般用来分隔ID、NUM、专用符号和关键字，词法分析阶段通常被忽略。

2、部分单词符号对应的种别码（可自行扩展）

单词符号	种别码	单词符号	种别码
main	1	/	25
int	2	(	26
char	3	)	27
if	4	[	28
else	5	]	29
for	6	{	30
while	7	}	31
return	8	,	32
void	9	:	33
STRING	50	;	34
ID	10	>	35
INT	20	<	36
=	21	>=	37
+	22	<=	38
-	23	==	29
*	24	!=	40

3、词法分析程序的功能

输入：所给文法的源程序字符串

输出：二元组（syn, token或sum）构成的序列。其中syn 为单词种别码；token 为存放的单词自身字符串；sum为整型常量（作为常量的值）。实现时，可将单词的二元组用结构进行处理

*4词法分析程序的主要算法思想*

一、主程序示意图

（1）关键字表初值

关键字作为特殊标识符处理，把他们预先安排在一张表格中（称为关键字表），当扫描程序识别出标识符时，查关键字表。如果查到匹配的单词，则该单词为关键字，否则为一般标识符。关键字表用字符串数组，描述如下：

char *KEY_WORDS[8]={“main”, “int”, “char”, “if”, “else”, “for”, “while’}

（2）主变量：syn, token和sum

二、扫描子程序算法思想

扫描子程序主要部分流程图如下图所示：

在这里插入图片描述

项目的整体图如下：

在这里插入图片描述

其中Lexical是单词的实体类，LexicalUtil是用来读取lexical.txt的内容，将其转化为一个哈希表，提供词法分析的辅助功能，LexicalAnalysisProcessor用来真正运行词法分析。

lexical.txt用来存放种别码与符号的映射关系：

在这里插入图片描述

Test.c是用来测试的源程序
自行编制一测试程序输出单词的二元组，并加以注释

四、实验过程原始记录

Lexical.java：

/**
 * 单词实体类
 * @Author DELL
 * @create 2020/11/13 14:14
 */
public class Lexical {

    /**
     * 单词种别
     */
    private Integer type;

    /**
     * 单词自身的值
     */
    private String value;

    public Lexical() {
    }

    public Lexical(Integer type, String value) {
        this.type = type;
        this.value = value;
    }

    public Integer getType() {
        return type;
    }


    public void setType(Integer type) {
        this.type = type;
    }

    public String getValue() {
        return value;
    }

    public void setValue(String value) {
        this.value = value;
    }

    @Override
    public String toString() {
        return "(" +
                + type +
                "," + value +
                ')';
    }
}

LexicalUtil .java：

/**
 * 单词生成工具
 * @author DELL
 * @create 2020/11/13 14:10
 */
public class LexicalUtil {

    /**
     * 单词到种别码的映射
     */
    private Map<String,Integer> wordTable = new HashMap<>();

    /**
     * 默认存储文件的路径
     */
    public static final String DEFAULT_LEXICAL_PATH = "lexical.txt";

    public LexicalUtil() throws IOException{
        this(DEFAULT_LEXICAL_PATH);
    }

    public LexicalUtil(String path) throws IOException {
        this(new File(
                Thread.currentThread().getContextClassLoader().getResource(path).getPath()
        ));
    }

    public LexicalUtil(File file) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        Integer line = 0;
        String str = null;
        while((str = reader.readLine()) != null) {
            str = str.trim();
            if(str.charAt(0) != '(' || str.charAt(str.length()-1) != ')') {
                throw new IOException("单词表格式不正确："+file.getPath()+":"+line);
            }
            String[] ans = doAnalysis(str.toCharArray());
            wordTable.put(ans[0],Integer.valueOf(ans[1]));
            line++;
        }
        reader.close();
    }

    /**
     * 将形如（a,b)的字符串中的a和b解析出来
     * 例：
     * (,,32)   => a = ","     b = "32"
     * (main,1) => a = "main"  b = "1"
     * 注：必须确保输入的字符串合法
     * @param ch
     * @return
     */
    private String[] doAnalysis(char[] ch) {
        String[] ans = new String[2];
        final char left_token = '(';
        final char separator = ',';
        boolean separator_exist = false;
        StringBuilder sb = new StringBuilder();
        LinkedList<Character> stack = new LinkedList<>();
        // 从右到左依次加入栈
        for(int i = ch.length - 1;i >= 0;i--) {
            char c = ch[i];
            // 遇见从右到左的第一个 ',' 时，出栈后减去第一个字母')'，即为 b
            if(c == separator && !separator_exist) {
                while(!stack.isEmpty()) {
                    sb.append(stack.pop());
                }
                sb.deleteCharAt(sb.length()-1);
                ans[1] = sb.toString();
                sb = new StringBuilder();
                separator_exist = true;
                continue;
            } else if(c == left_token && i == 0) {
                // 遇见 '(' 且下标索引为0，出栈即为 a
                while(!stack.isEmpty()) {
                    sb.append(stack.pop());
                }
                ans[0] = sb.toString();
                break;
            }
            stack.push(ch[i]);
        }
        return ans;
    }

    /**
     * 打印字母映射表
     */
    public void printLexicalTable() {
        for (Map.Entry<String, Integer> entry : wordTable.entrySet()) {
            System.out.println("('" + entry.getKey() + "'," + entry.getValue()+")");
        }
    }

    public Integer getType(String value) {
        if(!wordTable.containsKey(value)) {
            return -1;
        }
        return wordTable.get(value);
    }

    public Lexical getLexical(String value) {
        int type = getType(value);
        if(type == -1) {
            return null;
        }
        return new Lexical(type,value);
    }
}

词法分析核心类：

/**
 * @Author DELL
 * @create 2020/9/26 11:19
 */
public class LexicalAnalysisProcessor {

    /**
     * 存放单词
     */
    private List<Lexical> ans = new ArrayList<>();

    /**
     * io回退流，可回退字符的io
     */
    private PushbackReader reader;

    /**
     * 词法分析工具
     */
    private LexicalUtil util;

    /**
     * 缓存区
     */
    private char[] buf = new char[1024];

    /**
     * 保存缓存区写到了哪个位置
     */
    private int offset = 0;

    /**
     * 当前字符
     */
    private char cur = ' ';

    /**
     * 标识字符串是否开始
     */
    private boolean string_start = false;

    private static final char STRING_FLAG = '\"';

    private static final char BLANK_SPACE = ' ';

    private static final char LINE_R = '\r';

    private static final char LINE_N = '\n';

    private static final String INT = "INT";

    private static final String ID = "ID";

    private static final String STRING = "STRING";

    public LexicalAnalysisProcessor() throws IOException {
        reader = null;
        util = new LexicalUtil();
    }

    public void doProcess(String path) throws IOException {
        doProcess(new File(
                Thread.currentThread().getContextClassLoader().getResource(path).getPath()
        ));
    }

    /**
     * 执行词法分析
     * @param file
     */
    public void doProcess(File file) throws IOException {
        reader = new PushbackReader(new FileReader(file));
        int flag;
        // 若文件读取未结束，继续读取
        while((flag = reader.read()) != -1) {
            cur = (char)flag;
            // 跳过空格与换行
            skipBlank();
            if(isLetter()) {
                // 当前字符为字母，继续读取
                while(isLetter() || isNumber()) {
                    buf[offset++] = cur;
                    goNext();
                }
                // 回退一个字符
                reader.unread(cur);
                // 没有出现过双引号
                if(!string_start) {
                    String name = new String(buf, 0, offset);
                    int type = util.getType(name);
                    offset = 0;
                    if (type == -1) {
                        // 不是关键字，为标识符
                        int true_type = util.getType(ID);
                        ans.add(new Lexical(true_type, name));
                    } else {
                        // 是关键字，直接从映射表拿
                        ans.add(util.getLexical(name));
                    }
                }
            } else if(isNumber()) {
                // 当前字符为数字，继续读取
                while (isNumber()) {
                    buf[offset++] = cur;
                    goNext();
                }
                // 回退一个字符
                reader.unread(cur);
                String value = new String(buf,0,offset);
                offset = 0;
                int type = util.getType(INT);
                ans.add(new Lexical(type,value));
            }  else {
                switch (cur) {
                    case '<': {
                        char next = next();
                        char[] ch = new char[2];
                        ch[0] = cur;
                        ch[1] = next;
                        if(next == '=') {
                            // <=
                            Lexical lexical = util.getLexical(String.valueOf(ch));
                            ans.add(lexical);
                            goNext();
                        } else if(next == '<'){
                            // <<
                            Lexical lexical = util.getLexical(String.valueOf(ch));
                            ans.add(lexical);
                            goNext();
                        } else {
                            // <
                            Lexical lexical = util.getLexical(String.valueOf(cur));
                            ans.add(lexical);
                        }
                        break;
                    }
                    case '>':
                    case '=':
                    case '!': {
                        char next = next();
                        if(next == '=') {
                            // >= == !=
                            char[] ch = new char[2];
                            ch[0] = cur;
                            ch[1] = next;
                            Lexical lexical = util.getLexical(String.valueOf(ch));
                            ans.add(lexical);
                            goNext();
                        } else {
                            // > = !
                            Lexical lexical = util.getLexical(String.valueOf(cur));
                            ans.add(lexical);
                        }
                        break;
                    }
                    case STRING_FLAG: {
                        if(string_start) {
                            // 字符串结束，存储
                            string_start = false;
                            String name = new String(buf, 0, offset);
                            int type = util.getType(STRING);
                            ans.add(new Lexical(type,name));
                            offset = 0;
                        } else {
                            // 字符串开始
                            string_start = true;
                            offset = 0;
                        }
                        break;
                    }
                    default:{
                        int type = util.getType(String.valueOf(cur));
                        if(type != -1) {
                            Lexical lexical = util.getLexical(String.valueOf(cur));
                            ans.add(lexical);
                        } else {
                            throw new IllegalStateException("语法错误");
                        }
                    }
                }
            }

        }
        reader.close();
        print();
    }

    private void print() {
        for(int i = 0;i < ans.size();i++) {
            System.out.print(ans.get(i)+"   ");
            if((i+1) % 5 == 0) System.out.println();
        }
    }


    /**
     * 获取并跳转到下一个字符
     * @return
     * @throws IOException
     */
    private char goNext() throws IOException {
        if(reader == null) {
            throw new IllegalStateException("");
        }
        int next = reader.read();
        if(next == -1) {
            cur = LINE_N;
        } else {
            cur = (char)next;
        }
        return cur;
    }

    /**
     * 获取下一个字符
     * @return
     * @throws IOException
     */
    private char next() throws IOException {
        if(reader == null) {
            throw new IllegalStateException("");
        }
        int next = reader.read();
        reader.unread(next);
        if(next == -1) {
            return '@';
        }
        return (char)next;
    }

    /**
     * 跳过空格与换行
     * @throws IOException
     */
    private void skipBlank() throws IOException {
        while(cur == BLANK_SPACE || cur == LINE_R || cur == LINE_N) {
            goNext();
        }
    }

    private boolean isLetter() {
        return (cur >= 'a' && cur <= 'z')
                || (cur >= 'A' && cur <= 'Z');
    }

    private boolean isNumber() {
        return cur >= '0' && cur <= '9';
    }

    public static void main(String[] args) throws IOException {
        LexicalAnalysisProcessor processor = new LexicalAnalysisProcessor();
        processor.doProcess("test.c");
    }

}

五、实验结果与分析

单词映射关系表（lexical.txt)如下:

在这里插入图片描述

5.1正常测试：

输入的源文件内容如下：

void func()
{

}
int main()
{
    char a[6] = "Hello";
    int j,sum = 0;
    while(sum < 6) {
        if(sum != 3) {
            for(j = 0;j < i;j++) {
                sum = sum + 2 * j / 2 - 2
            }
        }
        cout << a[i] << endl;
    }
    return i;
}

生成的词法结果如下：
在这里插入图片描述

5.2 错误测试

在以下程序中输入一个映射表没有的字符：

void func()
{
%
}
int main()
{
    char a[6] = "Hello";
    int j,sum = 0;
    while(sum < 6) {
        if(sum != 3) {
            for(j = 0;j < i;j++) {
                sum = sum + 2 * j / 2 - 2
            }
        }
        cout << a[i] << endl;
    }
    return i;
}