C#写的一个词法分析器(编译原理)

原创 2007年09月21日 20:31:00

        最近编译原理课老师要求做一个词法分析器,现在正在学习C#,所以就用C#做了一个玩玩,初步验证了一下,应该符合老师的要求啦,在这里把代码写出来大家看看啦,有什么不对的地方大家多多指教啊! 

        首先新建了一个C#windows应用程序项目,我的命名为WordAnalysis,

        先说一下老师的要求是怎么样的啦,主要是分析一个类似于Pascal语言的语句,书上要求比较简单,只要求识别DIM,IF,DO,STOP,END,INT关键字、变量(长度不能超过8位)、数字、运算符(=,*,**,+) 和逗号以及括号。分析结果要求以二元式的形式,所以我们要用到结构体来定义一个二元式了。结构体的定义代码如下:

        struct AnalysisResult
        {
            public AnalysisResult(string HelpStr, int code)
            {
                this.helpStr = HelpStr;
                this.AnsCode = code;
            }

            public override string ToString()
            {
                return "("+helpStr+","+AnsCode.ToString()+")";
            }

            public string helpStr;
            public int AnsCode;
        }

        这里我重写了这个结构体的ToString()方法,变为输出形如:($DIM,1)的形式。

        所以我们的二元式就是以下几种可能:
        ($SPACE,0) ($DIM,1) ($IF,2) ($DO,3) ($STOP,4) ($END,5) ($ID,6) ($INT,7) ($ASSIGH, 8)
        ($PLUS,9) ($STAR,10) ($POWER,11) ($COMMA,12) ($LPAR,13) ($RPAR,14) 
        ($ENTER,15) ($ERROR,16)      

        意思书上都有说明,根据英文意思也知道是识别什么的了,现在最重要的就是开始分析了,我这里采取的思路是:先把输入的语句按行分为若干个数组,之后再逐行分析,每行再按照空格划分为若干个数组,再对数组中的每个字符串进行分析,最后把结果输出,如果分析过程中发现了词法错误,那么在输出错误类型。在这里我们有三种错误类型:非法的标识符、错误的表达式、标识符长度最大为8,大家也可以自己再进行扩展。

        现在具体说说是如何对字符串进行分析的,我的分析思路可能有一些繁琐,大家也可以自己再进行优化,这里仅供参考。首先我们判断字符串的长度,如果为1那么很明显不可能是关键字,所以可以减小判断范围,如果为1:依次判断是否为数字、运算符、字母(变量名),否则输出错误信息(非法的标识符)。如何不为1,则进行以下分析:

        首先判断是否含有运算符,如果没有,那么则说明要判断的内容是一个整体,否则为一个表达式。如果没有运算符:依次判断是否为整数、关键字、变量名,否则输出错误信息(非法的标识符)。如果有运算符,则说明是一个表达式,其分析过程将会稍微复杂一些。分析过程我们单独说明:

        首先把要分析的内容转换为一个字符数组,然后从第一个字符开始分析,当我们分析到第n个字符为运算符时,则把前面n-1个字符看作一个整体进行分析,依次判断是否为数字、关键字、变量,否则输出错误信息。然后判断第n个字符是何种运算符,这里有个特殊情况,就是当其为 * 这个运算符时,我们应该再判断第n+1个是否为*,是则输出($POWER,11),再从n+2个字符开始分析,否则输出($STAR,10),再从第n+1个字符开始分析。这时重复前面的步骤,直到分析完所有字符。

         大致思路就是上面所说的,下面我把我实现的具体代码写出来给大家做个参考。

         新建一个类文件,名为:Analysis.cs,代码如下:

using System;
using System.Collections.Generic;
using System.Text;

namespace WordsAnalysis
{
    class Analysis
    {
        private string ansStr;
        private string[] ArrayByLine;
        private List<string> errList;
        private List<AnalysisResult> resultList;

        /// <summary>
        ///
        /// </summary>
        struct AnalysisResult
        {
            public AnalysisResult(string HelpStr, int code)
            {
                this.helpStr = HelpStr;
                this.AnsCode = code;
            }

            public override string ToString()
            {
                return "("+helpStr+","+AnsCode.ToString()+")";
            }

            public string helpStr;
            public int AnsCode;
        }
        /// <summary>
        /// 词法分析类构造方法
        /// </summary>
        public Analysis()
        {
            errList=new List<string>();
            resultList = new List<AnalysisResult>();
        }

        /// <summary>
        /// 词法分析类构造方法
        /// </summary>
        /// <param name="str">要分析的内容</param>
        public Analysis(string str)
        {
            errList = new List<string>();
            resultList = new List<AnalysisResult>();
            this.ansStr=str;
        }

        /// <summary>
        /// 要分析的内容
        /// </summary>
        public string AnalysisString
        {
            get { return this.ansStr; }
            set { this.ansStr = value; }
        }

        /// <summary>
        /// 将输入内容按行分成数组
        /// </summary>
        /// <param name="splitStr">输入内容</param>
        private void SplitByLine(string splitStr)
        {
            this.ArrayByLine=splitStr.Split('/n');
        }

        /// <summary>
        /// 将内容以空格划分成数组
        /// </summary>
        /// <param name="splitStr">要分组的内容</param>
        /// <returns>已经分好的数组</returns>
        private string[] SplitBySpace(string splitStr)
        {
            return splitStr.Split(' ');
        }

        /// <summary>
        /// 开始分析
        /// </summary>
        public void Anaslysis()
        {
            SplitByLine(ansStr);
            for (int i = 0; i < ArrayByLine.Length; i++)
            {
                string[] checkStrings = SplitBySpace(ArrayByLine[i]);
                foreach (string checkStr in checkStrings)
                {
                    if (checkStr != "")
                    {
                        check(checkStr, i + 1);
                        resultList.Add(new AnalysisResult("$SPACE", 0));
                    }
                }
                resultList.Add(new AnalysisResult("$ENTER", 15));
            }
        }

        /// <summary>
        /// 分析指定的代码段
        /// </summary>
        /// <param name="str">代码段</param>
        /// <param name="LineCode">行号</param>
        private void check(string str, int LineCode)
        {
            if (str.Length == 1)
            {
                if(Char.IsNumber(str.ToCharArray()[0])==true)
                {
                    resultList.Add(new AnalysisResult("$INT",7));
                    return;
                }
                else if(CheckOperend(str)!=0)
                {
                    int n = CheckOperend(str);
                    switch (n)
                    {
                        case 8: resultList.Add(new AnalysisResult("$ASSIGH", 8)); break;
                        case 9: resultList.Add(new AnalysisResult("$PLUS", 9)); break;
                        case 10: resultList.Add(new AnalysisResult("$STAR", 10)); break;
                        case 12: resultList.Add(new AnalysisResult("$COMMA", 12)); break;
                        case 13: resultList.Add(new AnalysisResult("$LPAR", 13)); break;
                        case 14: resultList.Add(new AnalysisResult("$RPAR", 14)); break;
                    }
                    return;
                }
                else if (Char.IsLetter(str, 0) == true)
                {
                    resultList.Add(new AnalysisResult("$ID", 6));
                    return;
                }
                else
                {
                    errList.Add(errMessage(LineCode, 1, str));
                    resultList.Add(new AnalysisResult("$ERROR", 16));
                    return;
                }
               
            }
            if (HasOperend(str) == false)
            {
                if (IsNumeric(str.ToCharArray()) == true)
                {
                    resultList.Add(new AnalysisResult("$INT", 7));
                    return;
                }
                else if (CheckKeepWord(str.ToUpper()) != 0)
                {
                    int n = CheckKeepWord(str.ToUpper());
                    switch (n)
                    {
                        case 1: resultList.Add(new AnalysisResult("$DIM", 1)); break;
                        case 2: resultList.Add(new AnalysisResult("$IF", 2)); break;
                        case 3: resultList.Add(new AnalysisResult("$DO", 3)); break;
                        case 4: resultList.Add(new AnalysisResult("$STOP", 4)); break;
                        case 5: resultList.Add(new AnalysisResult("$END", 5)); break;
                    }
                    return;
                }
                else if (IsID(str.ToCharArray()) == true)
                {
                    if (str.Length <= 8)
                    {
                        resultList.Add(new AnalysisResult("$ID", 6));
                        return;
                    }
                    else
                    {
                        errList.Add(errMessage(LineCode, 3, str));
                        return;
                    }
                }
                else
                {
                    errList.Add(errMessage(LineCode, 1, str));
                    return;
                }
            }
            else if(HasOperend(str)==true)
            {
                char[] chars = str.ToCharArray();
                int k = 0;
                for (int i = 0; i < chars.Length; i++)
                {
                    if(IsOperend(chars[i])==true)
                    {
                        if ((i - k) == 0)
                        {
                            int n = CheckOperend(chars[i]);
                            switch (n)
                            {
                                case 8: resultList.Add(new AnalysisResult("$ASSIGH", 8)); break;
                                case 9: resultList.Add(new AnalysisResult("$PLUS", 9)); break;
                                case 10:
                                    {
                                        try
                                        {
                                            char power = chars[i + 1];
                                            if (power == '*')
                                            {
                                                resultList.Add(new AnalysisResult("$POWER", 11));
                                                i++;
                                            }
                                            else
                                            {
                                                resultList.Add(new AnalysisResult("$STAR", 10));
                                            }
                                        }
                                        catch
                                        {
                                            resultList.Add(new AnalysisResult("$STAR", 10));
                                        }
                                        break;
                                    }
                                case 12: resultList.Add(new AnalysisResult("$COMMA", 12)); break;
                                case 13: resultList.Add(new AnalysisResult("$LPAR", 13)); break;
                                case 14: resultList.Add(new AnalysisResult("$RPAR", 14)); break;
                                default: break;
                            }
                        }
                        else
                        {
                            char[] tempChar = TempChar(chars, k, i);
                            if (tempChar.Length == 1)
                            {
                                if (Char.IsNumber(tempChar[0]) == true)
                                {
                                    resultList.Add(new AnalysisResult("$INT", 7));
                                }
                                else if (Char.IsLetter(tempChar[0]) == true)
                                {
                                    resultList.Add(new AnalysisResult("$ID", 6));
                                }
                                else
                                {
                                    errList.Add(errMessage(LineCode, 2, str));
                                }
                                int n = CheckOperend(Convert.ToString(chars[i]));
                                switch (n)
                                {
                                    case 8: resultList.Add(new AnalysisResult("$ASSIGH", 8)); break;
                                    case 9: resultList.Add(new AnalysisResult("$PLUS", 9)); break;
                                    case 10:
                                        {
                                            try
                                            {
                                                char power = chars[i + 1];
                                                if (power == '*')
                                                {
                                                    resultList.Add(new AnalysisResult("$POWER", 11));
                                                    i++;
                                                }
                                                else
                                                {
                                                    resultList.Add(new AnalysisResult("$STAR", 10));
                                                }
                                            }
                                            catch
                                            {
                                                resultList.Add(new AnalysisResult("$STAR", 10));
                                            }
                                            break;
                                        }
                                    case 12: resultList.Add(new AnalysisResult("$COMMA", 12)); break;
                                    case 13: resultList.Add(new AnalysisResult("$LPAR", 13)); break;
                                    case 14: resultList.Add(new AnalysisResult("$RPAR", 14)); break;
                                }
                            }
                            else
                            {
                                if (CheckKeepWord(Convert.ToString(tempChar).ToUpper()) != 0)
                                {
                                    int n = CheckKeepWord(Convert.ToString(tempChar).ToUpper());
                                    switch (n)
                                    {
                                        case 1: resultList.Add(new AnalysisResult("$DIM", 1)); break;
                                        case 2: resultList.Add(new AnalysisResult("$IF", 2)); break;
                                        case 3: resultList.Add(new AnalysisResult("$DO", 3)); break;
                                        case 4: resultList.Add(new AnalysisResult("$STOP", 4)); break;
                                        case 5: resultList.Add(new AnalysisResult("$END", 5)); break;
                                    }
                                }
                                else if (IsID(tempChar) == true)
                                {
                                    resultList.Add(new AnalysisResult("$ID", 6));
                                }
                                else if (IsNumeric(tempChar) == true)
                                {
                                    resultList.Add(new AnalysisResult("$INT", 7));
                                }
                                else
                                {
                                    errList.Add(errMessage(LineCode, 2, str));
                                }
                                int x = CheckOperend(Convert.ToString(chars[i]));
                                switch (x)
                                {
                                    case 8: resultList.Add(new AnalysisResult("$ASSIGH", 8)); break;
                                    case 9: resultList.Add(new AnalysisResult("$PLUS", 9)); break;
                                    case 10:
                                        {
                                            if (tempChar[i + 1] == '*')
                                            {
                                                resultList.Add(new AnalysisResult("$POWER", 11));
                                                i++;
                                            }
                                            resultList.Add(new AnalysisResult("$STAR", 10));
                                            break;
                                        }
                                    case 12: resultList.Add(new AnalysisResult("$COMMA", 12)); break;
                                    case 13: resultList.Add(new AnalysisResult("$LPAR", 13)); break;
                                    case 14: resultList.Add(new AnalysisResult("$RPAR", 14)); break;
                                }
                            }
                        }
                        k = i + 1;
                    }
                }
                if (k == chars.Length - 1)
                {
                    char[] tempChar = TempChar(chars, k, chars.Length);
                    if (CheckKeepWord(Convert.ToString(tempChar).ToUpper()) != 0)
                    {
                        int n = CheckKeepWord(Convert.ToString(tempChar).ToUpper());
                        switch (n)
                        {
                            case 1: resultList.Add(new AnalysisResult("$DIM", 1)); break;
                            case 2: resultList.Add(new AnalysisResult("$IF", 2)); break;
                            case 3: resultList.Add(new AnalysisResult("$DO", 3)); break;
                            case 4: resultList.Add(new AnalysisResult("$STOP", 4)); break;
                            case 5: resultList.Add(new AnalysisResult("$END", 5)); break;
                        }
                    }
                    else if (IsID(tempChar) == true)
                    {
                        resultList.Add(new AnalysisResult("$ID", 6));
                    }
                    else if (IsNumeric(tempChar) == true)
                    {
                        resultList.Add(new AnalysisResult("$INT", 7));
                    }
                    else
                    {
                        errList.Add(errMessage(LineCode, 2, str));
                    }
                }
                return;
            }
        }

        /// <summary>
        /// 检查是否为数字
        /// </summary>
        /// <param name="c">要检查的内容</param>
        /// <returns>是:true,否:false</returns>
        private bool IsNumeric(char[] chars)
        {
            foreach(char c in chars)
            {
                if (!Char.IsNumber(c))
                {
                    return false;
                }
            }
            return true;
        }

        /// <summary>
        /// 检查是否为数字
        /// </summary>
        /// <param name="c">要检查的内容</param>
        /// <returns>是:true,否:false</returns>
        private bool IsOneNumeric(char c)
        {
            if (c >= '0' && c <= '9') return true;
            return false;
        }

        /// <summary>
        /// 检查是否为字母
        /// </summary>
        /// <param name="s">要检查的内容</param>
        /// <returns>是:true,否:false</returns>
        private bool IsLetter(char c)
        {
            if (Char.IsLetter(c)) return true;
            return false;
        }

        /// <summary>
        /// 检查是否含有操作符
        /// </summary>
        /// <param name="str">要检查的内容</param>
        /// <returns>含有:true,没有:false</returns>
        private bool HasOperend(string str)
        {
            int n = 0;
            char[] operends={'=','+','*',',','(',')'};
            n = str.IndexOfAny(operends);
            if (n >= 0)
                return true;
            return false;
        }

        /// <summary>
        /// 检查是否为操作符
        /// </summary>
        /// <param name="c">要检查的内容</param>
        /// <returns>是:true,否:false</returns>
        private bool IsOperend(char c)
        {
            int n = 0;
            string str = Convert.ToString(c);
            char[] operends ={ '=', '+', '*', ',', '(', ')' };
            n = str.IndexOfAny(operends);
            if (n >=0)
                return true;
            return false;
        }

        /// <summary>
        /// 得到指定范围的新char类型数组
        /// </summary>
        /// <param name="chars">原始数组</param>
        /// <param name="start">起始位置</param>
        /// <param name="end">结束位置</param>
        /// <returns>新的长度为end-start的char类型数组</returns>
        private char[] TempChar(char[] chars, int start, int end)
        {
            char[] tempChar=new char[end-start];
            int n = 0;
            for (int i = start; i < end; i++)
            {
                tempChar[n]=chars[i];
                n++;
            }
            return tempChar;
        }

        /// <summary>
        /// 检查操作符类型
        /// </summary>
        /// <param name="c">要检查的字符</param>
        /// <returns>操作符类型编码,为0则不是操作符</returns>
        private int CheckOperend(char c)
        {
            if (c == '=') return 8;
            else if (c == '+') return 9;
            else if (c == '*') return 10;
            else if (c == ',') return 12;
            else if (c == '(') return 13;
            else if (c == ')') return 14;
            else return 0;
        }

        /// <summary>
        /// 检查操作符类型
        /// </summary>
        /// <param name="s">要检查的内容</param>
        /// <returns>操作符编码,为0则不是操作符</returns>
        private int CheckOperend(string s)
        {
            if (s == "=") return 8;
            else if (s == "+") return 9;
            else if (s == "*") return 10;
            else if (s == ",") return 12;
            else if (s == "(") return 13;
            else if (s == ")") return 14;
            else return 0;
        }

        private bool IsID(char[] c)
        {
            if (Char.IsLetter(c[0])==false&&c[0]!='_')
            {
                return false;
            }
            for (int i = 1; i <= c.Length; i++)
            {
                if (Char.IsLetter(c[0]) == false && Char.IsNumber(c[0]) == false && c[0] != '_')
                {
                    return false;
                }
            }
            return true;
        }

        /// <summary>
        /// 检查是否为保留字
        /// </summary>
        /// <param name="c">要检查的内容</param>
        /// <returns>返回保留字编码,为0则不是保留字</returns>
        private int CheckKeepWord(char[] c)
        {
            string str = Convert.ToString(c).ToUpper();
            int n = 0;
            switch (str)
            {
                case "DIM": n= 1; break;
                case "IF": n= 2; break;
                case "DO": n= 3; break;
                case "STOP": n= 4; break;
                case "END": n= 5; break;
                default: n= 0; break;
            }
            return n;
        }

        /// <summary>
        /// 检查是否为保留字
        /// </summary>
        /// <param name="str">要检查的内容</param>
        /// <returns>返回保留字编码,为0则不是保留字</returns>
        private int CheckKeepWord(string str)
        {
            int n = 0;
            switch (str)
            {
                case "DIM": n = 1; break;
                case "IF": n = 2; break;
                case "DO": n = 3; break;
                case "STOP": n = 4; break;
                case "END": n = 5; break;
                default: n = 0; break;
            }
            return n;
        }


        /// <summary>
        /// 显示错误信息
        /// </summary>
        /// <returns>所有错误信息</returns>
        public string ShowError()
        {
            string error = "";
            string[] errs = errList.ToArray();
            foreach (string err in errs)
            {
                error += err + "/r/n";
            }
            return error;
        }

        /// <summary>
        /// 显示分析结果
        /// </summary>
        /// <returns>所有分析结果</returns>
        public string ShowResult()
        {
            string result = "";
            AnalysisResult[] ans = resultList.ToArray();
            foreach (AnalysisResult an in ans)
            {
                if (an.AnsCode == 15)
                {
                    result += "/r/n";
                }
                else if(an.AnsCode==0)
                {
                    result += " ";
                }
                else
                {
                    result += an.ToString();
                }
            }
            return result;
        }

        /// <summary>
        /// 生成错误信息
        /// </summary>
        /// <param name="errLine">错误行号</param>
        /// <param name="errCode">错误代号</param>
        /// <param name="errStr">错误内容</param>
        /// <returns>错误信息</returns>
        private string errMessage(int errLine, int errCode, string errStr)
        {
            string errClass="";
            if (errCode == 1)
            {
                errClass = "非法的标识符";
            }
            else if (errCode == 2)
            {
                errClass = "错误的表达式";
            }
            else if (errCode == 3)
            {
                errClass = "标识符长度最大为8";
            }
            resultList.Add(new AnalysisResult("$ERROR", 16));
            return "第"+errLine+"行 "+errClass+":"+errStr;
        }

    }
}



        界面控件如下:

很明显是三个textbox控件,分别名为:InputTxt,AnsResultTxt,和errTxt,后两个为只读控件,一个Button按钮名为OKBtn,基本界面就是这样了。大家也可以根据要求自己改。然后在OKBtn的Click事件中写入以下代码:

            Analysis ans = new Analysis();
            ans.AnalysisString = InputTxt.Text.ToString();
            ans.Anaslysis();
            AnsResultTxt.Text = ans.ShowResult();
            errTxt.Text = ans.ShowError();

之后就可以运行测试了。

总的就是这样了,大家试试看,有什么问题请大家及时提出,谢谢!

C#学习(五)—数组与结构、枚举、集合的区别

一、什么是数组?数组的作用?怎样用数组? 1、什么是数组?          数组是包含若干相同类型元素的一组变量。这些变量都可以通过索引进行访问。数组中的变量称为数组的元素。数组能够容纳元素的数...

C#中的抽象类、密封类、静态类

1.抽象类  通过在类定义前面放置关键字 abstract,可以将类声明为抽象类,不能实例化。抽象类的用途是提供一个可供多个派生类共享的通用基类定义。抽象类也可以定义抽象方法。方法是将关键字 abs...
  • jinyuu
  • jinyuu
  • 2016年07月08日 20:08
  • 996

深入浅出编译原理-4-一个简单词法分析器的C语言实现

引言 光说不练,假把式。 此小节来做一个实验,用c语言自己实现一个简单的词法分析器,来加深对词法分析的理解。感兴趣的就自己分析一下源码吧,挺简单的,就没画流程图,请见谅。闲言少叙,我们开始吧。 ...

C#词法分析器编译原理

  • 2011年05月07日 16:29
  • 50KB
  • 下载

C# 词法分析器 编译原理

  • 2011年03月20日 09:17
  • 76KB
  • 下载

编译原理——Tiny词法分析器c++实现

Tiny语言和c-相比更为简单,在实现的时候,对文本的读取,发现回车换行是一个头疼的问题,fgetc()函数和fseek()函数并不是一一对应的。fseek()会回车换行算两个字符,而fgetc()读...

编译原理 词法分析器(C#)

  • 2012年05月17日 10:05
  • 107KB
  • 下载

编译原理c++简单词法分析器

简单词法分析程序         编译原理课程词法分析程序,自己c++写的比较简单功能实现,需要的用户可以在此基础上进行自己思路的扩充修改。功能:读取一个自己设定路径下的txt文件中代码,然后将分析...

C#做的编译原理词法分析器

  • 2012年12月13日 04:07
  • 2.13MB
  • 下载
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:C#写的一个词法分析器(编译原理)
举报原因:
原因补充:

(最多只允许输入30个字)