软件工程个人作业完成历程

最新推荐文章于 2024-11-09 20:57:23 发布

diaobu8124

最新推荐文章于 2024-11-09 20:57:23 发布

阅读量84

点赞数

文章标签： c# c/c++

原文链接：http://www.cnblogs.com/blogone/p/3336665.html

版权

词频统计程序开发文档

一，程序功能概要

词频统计器：按照作业要求，给程序输入一个路径，然后遍历其里面所有文件夹，然后统计所有以“.txt”,”.cs”,”.h”,”.cpp”结尾的文件里的单词数目，并以<word> frequency的形式输出到指定文件名的文本文件里，这里的〈word〉即为统计的单词，frequency为该单词数量，单词输出顺序以出现频数由高到低输出，若频数相同，则按字典顺序输出；

单词统计定义：由至少四个字母开头，以一个或多个数字结尾，为一个单词，若单词的数字部分相同，字母不分大小写相同，视为同一个单词，频数为所有相同单词个数。若字母部分分大小写完全相同，输出以后面数字大的单词作为输出；若字母部分分大小写不同，则以字典顺序小的作为输出。

二，设计思路：

1，输入目录

主函数进入后直接屏幕输出“请输入路径提醒用户输入目录”，用户输入的目录以字符串形式存到path里，并将path传给遍历函数Getdir（）

Console.WriteLine("请输入路径：");

path = @Console.ReadLine();

2遍历文件夹

函数Getdir（）收到路径输入后，遍历路径下的每一个文件夹，并检查所遍历到的每一个文件后缀名，若符合以上面提到的四个后缀结尾，则调用函数file_get_contents（）获取内容，遍历完所有文件为止：

public static void Getdir(string path)

{

if (path != null)

{

foreach (string n in Directory.GetDirectories(path))

{

//Console.WriteLine("目录：：{0}", n);

Getdir(@n);

}

foreach (string m in Directory.GetFiles(path))

{

if ((m.EndsWith(".txt") || m.EndsWith(".cs") || m.EndsWith(".cpp") || m.EndsWith(".h"))&&(!m.Equals(@path+@"\temp.txt")))

{

Console.WriteLine("文件：{0}", m);

file_get_contents(@m);

}

3，获取文件内容

函数file_get_contents（）接受到经Getdir（）检查后缀名后符合要求的文件名后获取起内容（这里若文件名为temp.txt则跳过，此文件为程序创建的文件(!m.Equals(@path+@"\temp.txt")))），获取内容时调用ReadLine（）函数一行一行读取，每读取一行以字符串形式传递给def（）函数进行解析，读到文件末尾为止：

public static void file_get_contents(string path)

{

using (StreamReader sr = new StreamReader(path, System.Text.Encoding.GetEncoding("GB2312")))

{

string line;

while ((line = sr.ReadLine()) != null)

{

Console.WriteLine(line);

def(line);

}

4：内容解析

函数def（）由file_get_contents(string path)传来的每一行字符串，若符合单词标准（以四个以上字母开头，数字或字母结尾不含其他符号），则调用write（）函数输出到文件“temp.txt”里，每个单词占一行：

public static void def(string str)

{

int i = str.Length, b = 0, m = -1;

for (; b < i; )

{

if ((m == -1) && (b < i - 3))

{

if (isletter(str[b]) && isletter(str[b + 1]) && isletter(str[b + 2]) && isletter(str[b + 3]))

m = 0;

else

{

b++;

}

else if (m == 0)

{

if (isletter(str[b]) || isnumber(str[b]))

{

strtemp+=str[b].ToString();

b++;

}

else

{

b++;

Console.WriteLine(strtemp);

write(strtemp,path);

strtemp=null;

m = -1;

}

else break;

}

public static void write(string str,string path)

{

StreamWriter sw = new StreamWriter((@path+@"\temp.txt"), true, System.Text.Encoding.Default);

sw.WriteLine(str);

sw.Close();

}

5,读取所有单词

这里设置一个结构数组word{string str；int efc；int time；}wd[]来储存一个单词，str为单词内容，efc为0或1，为0则最后输出的时候跳过，为1则输出，结构数组的大小由linum（）函数读取文件“temp.txt”文件后得到（但其实在解析调用write（）函数的时候设定一个静态变量就可以统计单词数的，不必要再来读取一遍），单词由read（）函数读取后存到wd[i].str里保存，读到文件末尾为止；

public static int linum(string path)

{

int i = 0;

string line;

StreamReader sr = new StreamReader((@path + @"\temp.txt"), System.Text.Encoding.Default);

while ((line=sr.ReadLine())!= null)

{

i++;

//Console.WriteLine("{0}:{1}",i,line);

}

sr.Close();

return i;

}

public static void read1(word[] wd,string path)

{

int i = 0;

StreamReader sr = new StreamReader((@path + @"\temp.txt"), System.Text.Encoding.Default);

for(;i<wd.Length;){

if ((wd[i].str = sr.ReadLine()) != null)

i++;

}

sr.Close();

}

6，开始统计：

以两个for循环结构，依次将每个单词和后面efc为1（efc初始均为1）的单词比较，若不一样则跳过，若一样则比较大小，符合输出规则的单词time++，不符合输出规则的efc置0，比完所有单词为止，统计两个单词有以下几种情况：

（1），单词完全一样，视为同一个单词；

（2），单词字母部分完全一样，数字部分不一样，视为同一个单词，按规则输出；

（3），单词字母部分不分大小写一样，视为同一个单词，按规则输出；

（4），单词不分大小写也不一样，视为两个单词；

for(i=0;i<wd.Length;i++){

if (wd[i].efc == 0) continue;

for (n = i; n < wd.Length; n++) {

if(wd[n].efc==0) continue;

if (wd[i].efc == 0) break;

char[] si=new char[wd[i].str.Length];

for (int b = 0; b < si.Length; b++)

si[b] = wd[i].str[b];

char[] sn=new char[wd[n].str.Length];

for (int b = 0; b < sn.Length; b++)

sn[b] = wd[n].str[b];

string sia="", sin=""; string sna="", snn="";

for (int b = si.Length - 1; b > 3;b-- ) {

if(!isnumber(si[b])){

try

{

if (b != (si.Length - 1))

{

sia = ((si.ToString()).Substring(0, b++));

sin = ((si.ToString()).Substring(b, si.Length - b));

break;

}

catch {

Console.WriteLine("stop{0}",b);

}

break;

}

for (int b = sn.Length - 1; b > 3; b--)

{

if (!isnumber(sn[b]))

{

try

{

if (b != (sn.Length - 1))

{

sna = ((sn.ToString()).Substring(0, b++));

snn = ((sn.ToString()).Substring(b, sn.Length - b));

break;

}

catch {

Console.WriteLine("stop!{0}",b);

}

break;

}

if(sia!=null&&sin!=null&&sna!=null&&snn!=null){

if (!(sia.ToUpper()).Equals(sna.ToUpper()))

{

continue;

}

else {

if (sia.Equals(sna))

{

int inn = tailer(sin);

int nnn = tailer(snn);

if (nnn == inn)

{

wd[i].time++;

wd[n].efc = 0;

}

else if (inn > nnn)

{

wd[i].time++;

wd[n].efc = 0;

}

else

{

wd[n].time++;

wd[i].efc = 0;

break;

}

else {

for (int c = 0; c < sia.Length;c++ ) {

if(wd[i].efc==0&&wd[n].efc==0)

break;

if ((int)sia[c] < (int)sna[c])

{

wd[i].time++;

wd[n].efc = 0;

}

else if ((int)sia[c] > (int)sna[c])

{

wd[i].efc = 0;

wd[n].time++;

}

else continue;

}

}//for set

7，排序输出

按每个结构的time值由大到小排序，然后将efc为1的结构按输出规则输出其str和time值，单由于第6部没测试过去，时间紧急，还没来得及完成最后阶段代码编写。

8，删除临时文件

三，问题分析

1，程序读取原文本的时候是读取一行解析一行，能否全部读入后再进行解析，个人感觉读一行拆一行更可行；

2，程序需要创建一个临时文件，到最后再进行删除，由于c#语言掌握不好，未能找到方案用一种不需要创建文件的方式来储存单词；

3，统计的时候只能用结构的方式存储单词，能否直接存在字符串里，结构为值类型变量，传递的时候可能会出现问题，同时若单词数目过多，同时在内存里创建同样多的结构，内存开销可能会很大，比较的时候工作量也艰巨；

4，。。。。。。

转载于:https://www.cnblogs.com/blogone/p/3336665.html

diaobu8124

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫