编译原理——CMM词法分析器

该博客介绍了如何实现CMM词法分析器,通过将文件读取为字符串进行解析,区分保留字、符号、标识符和数字。文章中提到了TOK类用于表示词的类别,Token类包含详细信息,还有错误处理机制和FileIO类来读取文件。最后,通过Lex类完成词法分析。
摘要由CSDN通过智能技术生成

整体思路:

  为了将文件读取和内容处理分开,将整个文件读取为字符串,然后对该字符串进行解析,各种保留字和符号单独为一个类别,其值为本身,标识符、整型数字,浮点型数字为个为一个类,其值为具体值。

先创建TOK类 ,设置词的类别,共32个类别。

package CMMLex;

/**
 * Created by think on 2017/10/11.
 */
public class TOK {
    //Tok类型

    /* if */
    public static final int IF = 0;
    /* else */
    public static final int ELSE = 1;
    /* while */
    public static final int WHILE = 2;
    /* id */
    public static final int READ = 3;
    /* write */
    public static final int WRITE = 4;
    /* int */
    public static final int INT = 5;
    /* double */
    public static final int DOUBLE = 6;
    /* true */
    public static final int TRUE = 7;
    /* false */
    public static final int FALSE = 8;
    /* + */
    public static final int PLUS = 9;
    /* - */
    public static final int MINUS = 10;
    /* * */
    public static final int MUL = 11;
    /* / */
    public static final int DIV = 12;
    /* = */
    public static final int ASSIGN = 13;
    /* < */
    public static final int LT = 14;
    /* <= */
    public static final int LQT = 15;
    /* > */
    public static final int GT = 16;
    /* >= */
    public static final int GQT= 17;
    /* == */
    public static final int EQ = 18;
    /* != */
    public static final int NEQ = 19;
    /* ( */
    public static final int LPARENT= 20;
    /* ) */
    public static final int RPARENT = 21;
    /* ; */
    public static final int SEMI = 22;
    /* , */
    public static final int COMMA = 23;
    /* { */
    public static final int LBRACE = 24;
    /* } */
    public static final int RBRACE = 25;
    /* [ */
    public static final int LBRACKET = 26;
    /* ] */
    public static final int RBRACKET = 27;
    /* id */
    public static final int ID = 28;
    /* int型字面值 */
    public static final int LITERAL_INT = 29;
    /* real型字面值 */
    public static final int LITERAL_DOUBLE = 30;
    /* 文件结尾0*/
    public static final int EOF=31;

    //并存储tok类型的个数
    public static final int TOK_NUM = 32;

    //每个Tok类型对应的字符串类型
    public static final String[] GET_STRS = { "IF","ELSE", "WHILE","READ","WRITE", "INT",
            "DOUBLE","TRUE", "FALSE", "PLUS", "MINUS","MUL", "DIV", "ASSIGN", "LT","LQT",
            "GT", "GQT", "EQ","NEQ", "LPARENT", "RPARENT", "SEMI", "COMMA","LBRACE",
            "RBRACE",  "LBRACKET","RBRACKET","ID", "LITERAL_INT","LITERAL_DOUBLE","EOF"};


    //每个Tok类型对应的实际字符串
    public static final String[] GET_LOCAL_STRS = { "if", "else", "while", "read","write", "int",
            "double", "true", "false", "+","-", "*", "/", "=", "<", "<=", ">", ">=","==",
            "!=", "(", ")", ";",",", "{", "}", "[", "]", "identifier", "integer", "double","end_of_file"};

    //返回Tok类型字符串
    public static String getTokTypeStr(int type) {
        if (type < 0 || type > TOK_NUM)
            return "undefine";
        else
            return GET_STRS[type];
    }

    //返回Tok类型实际字符串
    public static String getTokTypeLocalStr(int type) {
        if (type < 0 || type > TOK_NUM)
            return "undefine";
        else
            return GET_LOCAL_STRS[type];
    }
}

创建Token类,具有类别、值、行号、开始列号和结束列号等属性。并在其中声明所有的内置的32Token类。

package CMMLex;

/**
 * Created by think on 2017/10/12.
 */
public class Token {

    /* if */
    public static final Token IF = new Token(TOK.IF);
    /* else */
    public static final Token ELSE = new Token(TOK.ELSE);
    /* while */
  
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值