软件工程个人作业完成历程

词频统计程序开发文档

一,程序功能概要

词频统计器:按照作业要求,给程序输入一个路径,然后遍历其里面所有文件夹,然后统计所有以“.txt”,”.cs”,”.h”,”.cpp”结尾的文件里的单词数目,并以<word>  frequency的形式输出到指定文件名的文本文件里,这里的〈word〉即为统计的单词,frequency为该单词数量,单词输出顺序以出现频数由高到低输出,若频数相同,则按字典顺序输出;

单词统计定义:由至少四个字母开头,以一个或多个数字结尾,为一个单词,若单词的数字部分相同,字母不分大小写相同,视为同一个单词,频数为所有相同单词个数。若字母部分分大小写完全相同,输出以后面数字大的单词作为输出;若字母部分分大小写不同,则以字典顺序小的作为输出。

二,设计思路:

1,输入目录

主函数进入后直接屏幕输出“请输入路径提醒用户输入目录”,用户输入的目录以字符串形式存到path里,并将path传给遍历函数Getdir()

  Console.WriteLine("请输入路径:");

            path = @Console.ReadLine();

2遍历文件夹

函数Getdir()收到路径输入后,遍历路径下的每一个文件夹,并检查所遍历到的每一个文件后缀名,若符合以上面提到的四个后缀结尾,则调用函数file_get_contents()获取内容,遍历完所有文件为止:

   public static void Getdir(string path)

        {

            if (path != null)

            {

                foreach (string n in Directory.GetDirectories(path))

                {

                    //Console.WriteLine("目录::{0}", n);

                    Getdir(@n);

                }

                foreach (string m in Directory.GetFiles(path))

                {

 

                    if ((m.EndsWith(".txt") || m.EndsWith(".cs") || m.EndsWith(".cpp") || m.EndsWith(".h"))&&(!m.Equals(@path+@"\temp.txt")))

                    {

                        Console.WriteLine("文件:{0}", m);

                        file_get_contents(@m);

                    }

                }

            }

        }

3,获取文件内容

函数file_get_contents()接受到经Getdir()检查后缀名后符合要求的文件名后获取起内容(这里若文件名为temp.txt则跳过,此文件为程序创建的文件(!m.Equals(@path+@"\temp.txt")))),获取内容时调用ReadLine()函数一行一行读取,每读取一行以字符串形式传递给def()函数进行解析,读到文件末尾为止:

public static void file_get_contents(string path)

        {

            using (StreamReader sr = new StreamReader(path, System.Text.Encoding.GetEncoding("GB2312")))

            {

                string line;

                while ((line = sr.ReadLine()) != null)

                {

                    Console.WriteLine(line);

                    def(line);

                }

            }

        }     

4:内容解析

函数def()由file_get_contents(string path)传来的每一行字符串,若符合单词标准(以四个以上字母开头,数字或字母结尾不含其他符号),则调用write()函数输出到文件“temp.txt”里,每个单词占一行:

   public static void def(string str)

        {

            int i = str.Length, b = 0, m = -1;

            for (; b < i; )

            {

                if ((m == -1) && (b < i - 3))

                {

                    if (isletter(str[b]) && isletter(str[b + 1]) && isletter(str[b + 2]) && isletter(str[b + 3]))

                        m = 0;

                    else

                    {

                        b++;

                    }

 

                }

                else if (m == 0)

                {

                    if (isletter(str[b]) || isnumber(str[b]))

                    {

                        strtemp+=str[b].ToString();

                        b++;

                    }

                    else

                    {

                        b++;

                        Console.WriteLine(strtemp);

                       write(strtemp,path);

                        strtemp=null;

                        m = -1;

                    }

                }

                else break;

            }

        }

public static void write(string str,string path)

        {

            StreamWriter sw = new StreamWriter((@path+@"\temp.txt"), true, System.Text.Encoding.Default);

            sw.WriteLine(str);

            sw.Close();

        }

5,读取所有单词

这里设置一个结构数组word{string str;int efc;int time;}wd[]来储存一个单词,str为单词内容,efc为0或1,为0则最后输出的时候跳过,为1则输出,结构数组的大小由linum()函数读取文件“temp.txt”文件后得到(但其实在解析调用write()函数的时候设定一个静态变量就可以统计单词数的,不必要再来读取一遍),单词由read()函数读取后存到wd[i].str里保存,读到文件末尾为止;

public static int linum(string path)

        {

            int i = 0;

                string line;

            StreamReader sr = new StreamReader((@path + @"\temp.txt"), System.Text.Encoding.Default);

            while ((line=sr.ReadLine())!= null)

                {

                    i++;

                    //Console.WriteLine("{0}:{1}",i,line);

                }

            sr.Close();

            return i;

        }

        public static void read1(word[] wd,string path)

        {

            int i = 0;

            StreamReader sr = new StreamReader((@path + @"\temp.txt"), System.Text.Encoding.Default);

            for(;i<wd.Length;){

                if ((wd[i].str = sr.ReadLine()) != null)

                    i++;

                }

            sr.Close();

        }

    }

}

6,开始统计:

以两个for循环结构,依次将每个单词和后面efc为1(efc初始均为1)的单词比较,若不一样则跳过,若一样则比较大小,符合输出规则的单词time++,不符合输出规则的efc置0,比完所有单词为止,统计两个单词有以下几种情况:

(1),单词完全一样,视为同一个单词;

(2),单词字母部分完全一样,数字部分不一样,视为同一个单词,按规则输出;

(3),单词字母部分不分大小写一样,视为同一个单词,按规则输出;

(4),单词不分大小写也不一样,视为两个单词;

for(i=0;i<wd.Length;i++){

                if (wd[i].efc == 0) continue;

                for (n = i; n < wd.Length; n++) {

                if(wd[n].efc==0) continue;

                if (wd[i].efc == 0) break;

                    char[] si=new char[wd[i].str.Length];

                    for (int b = 0; b < si.Length; b++)

                        si[b] = wd[i].str[b];

                    char[] sn=new char[wd[n].str.Length];

                    for (int b = 0; b < sn.Length; b++)

                        sn[b] = wd[n].str[b];

                    string sia="", sin=""; string sna="", snn="";

                    for (int b = si.Length - 1; b > 3;b-- ) {

                    if(!isnumber(si[b])){

                        try

                        {

                            if (b != (si.Length - 1))

                            {

                                sia = ((si.ToString()).Substring(0, b++));

                                sin = ((si.ToString()).Substring(b, si.Length - b));

                                break;

                            }

                        }

                        catch {

                            Console.WriteLine("stop{0}",b);

                        }

                        break;

                    }

                    }

                    for (int b = sn.Length - 1; b > 3; b--)

                    {

                        if (!isnumber(sn[b]))

 

                        {

                            try

                            {

                                if (b != (sn.Length - 1))

                                {

                                    sna = ((sn.ToString()).Substring(0, b++));

                                    snn = ((sn.ToString()).Substring(b, sn.Length - b));

                                    break;

                                }

                            }

                            catch {

                                Console.WriteLine("stop!{0}",b);

                            }

                            break;

                        }

                    }

                    if(sia!=null&&sin!=null&&sna!=null&&snn!=null){

                        if (!(sia.ToUpper()).Equals(sna.ToUpper()))

                        {

                            continue;

                        }

                        else {

                            if (sia.Equals(sna))

                            {

                                int inn = tailer(sin);

                                int nnn = tailer(snn);

                                if (nnn == inn)

                                {

                                    wd[i].time++;

                                    wd[n].efc = 0;

                                }

                                else if (inn > nnn)

                                {

                                    wd[i].time++;

                                    wd[n].efc = 0;

                                }

                                else

                                {

                                    wd[n].time++;

                                    wd[i].efc = 0;

                                    break;

                                }

                            }

                            else {

                                for (int c = 0; c < sia.Length;c++ ) {

                                    if(wd[i].efc==0&&wd[n].efc==0)

                                        break;

                                    if ((int)sia[c] < (int)sna[c])

                                    {

                                        wd[i].time++;

                                        wd[n].efc = 0;

                                    }

                                    else if ((int)sia[c] > (int)sna[c])

                                    {

                                        wd[i].efc = 0;

                                        wd[n].time++;

                                    }

                                    else continue;

                                }

                            }

                        }

                    }

                }

            }//for set

   7,排序输出

按每个结构的time值由大到小排序,然后将efc为1的结构按输出规则输出其str和time值,单由于第6部没测试过去,时间紧急,还没来得及完成最后阶段代码编写。

8,删除临时文件

三,问题分析

1,    程序读取原文本的时候是读取一行解析一行,能否全部读入后再进行解析,个人感觉读一行拆一行更可行;

2,    程序需要创建一个临时文件,到最后再进行删除,由于c#语言掌握不好,未能找到方案用一种不需要创建文件的方式来储存单词;

3,    统计的时候只能用结构的方式存储单词,能否直接存在字符串里,结构为值类型变量,传递的时候可能会出现问题,同时若单词数目过多,同时在内存里创建同样多的结构,内存开销可能会很大,比较的时候工作量也艰巨;

4,    。。。。。。

转载于:https://www.cnblogs.com/blogone/p/3336665.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值