第二周作业Wordcount master

最新推荐文章于 2025-12-02 21:20:38 发布

转载最新推荐文章于 2025-12-02 21:20:38 发布 · 118 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/zhangyixuan/p/8609247.html

文章标签：

#java

本文介绍了一个基于Java实现的WordCount工具，该工具能够统计文件中的单词数、字符数和行数等，并支持多种参数配置，如指定输出文件、忽略特定单词等功能。文中详细描述了程序的设计思路、实现细节及测试案例。

（1）GITHUB地址

https://github.com/zyxxuan/Wordcount-master

(2)PSP

PSP2.1表格

PSP2.1	PSP阶段	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	30	45
· Estimate	· 估计这个任务需要多少时间	15	10
Development	开发	600	1400
· Analysis	· 需求分析 (包括学习新技术)	200	420
· Design Spec	· 生成设计文档	100	180
· Design Review	· 设计复审 (和同事审核设计文档)	100	80
· Coding Standard	· 代码规范 (为目前的开发制定合适的规范)	100	180
· Design	· 具体设计	45	40
· Coding	· 具体编码	30	60
· Code Review	· 代码复审	20	30
· Test	· 测试（自我测试，修改代码，提交修改）	300	420
Reporting	报告	100	180
· Test Report	· 测试报告	100	210
· Size Measurement	· 计算工作量	15	15
· Postmortem & Process Improvement Plan	· 事后总结, 并提出过程改进计划	30	60
	合计	1785	3330

（3）解题思路

编写代码其实非常简单，但是最重要的还是要先知道题目的要求是什么。

也就是说，我们不应该着急先编写程序的代码，反而应该先看一下老师的具体要求，以及老师所留题目所参考的博客，也就是北京航空航天大学罗杰老师，以及邹欣老师的博客。

邹欣老师地址：http://www.cnblogs.com/xinz/p/7426280.html）

北京航空航天大学罗杰老师地址：http://www.cnblogs.com/jiel/p/7545780.html）

对于需求的分析，虽然作业文档中描述不限语言，但根据上课所说，感觉还是使用Java会适合后续的作业？

首先：

基本功能，就是-w -c -l，分别计算目标文件中的单词，字符，行数。比较简单可以直接执行。

之后先进行查找 -a命令，-e命令，分别运用上学期所学习的编译技术知识也可以直接进行编写。在-s递归目录下所有文件的情况下，实现文件夹的遍历。

参考文献《编译原理》（狼书）

(4)程序设计实现

代码整体分为三个类：Main ArgParser WordCounter

Main

主要负责管理整个程序流程的管控，调用解析参数函数以及使用 WordCount 类来统计内容然后输出。

运用图表现比较清晰

ArgParser

该类主要包含4个接口：

/* * 解析Main提供的原始参数列表 * @param args：Main函数提供的参数列表 * @return int:成功返回0，解析失败返回-1 */public int parse(String[] args);

/* * 获取目标文件，即要分析的文件，因为有-s参数，所以目标分析文件有多个 * @return 返回要分析的所有文件的路径 */public String[] getTarget();

/* * 检查是否包含有某个参数 * @return 返回要分析的所有文件的路径 */public boolean containsKey(String key);

/* * 获取参数的目标值，即类似-o,-e的输入文件 * @return 返回要分析的所有文件的路径 */public String get(String key)

(5) 代码说明

readFile:读取文件，计算word char line数目。

readFileByLine：读取文件，计算代码行，空行，注释行数目。

writeFile：写文件。将记录结果的内容写入指定文件。

getFilePath：将当前目录下所有文件进行遍历。将符合条件的文件放入ArrayList中。

readIgnoreFile：读取忽视词文件。对word进行相应处理。

// 实现 -c 参数public long countChar(String filename);

// 实现 -e 参数public void buildEscapeWord(String filename);

// 实现 -w 参数public long countWords(String filename);

// 实现 -l 参数public long countLines(String filename);

// 实现 -a 参数，long数组分别存放代码行/空行/注释行public long[] countALines(String filename);

1.参数的解析

for (int i = 0; i < args.length; i++) {

if (args[i].charAt(0) == '-') {

char arg = args[i].charAt(1);

switch (arg) {

case 'c':

case 'w':

case 'l':

case 'a':

case 's':

// 这里是针对共同文件做的操作，目标放在 Target 里

this.args.put(String.valueOf(arg), "");

break;

case 'o':

case 'e':

// 这里处理特别的参数，因为其后面跟着一个文件名，所以要直接接受

if (i+1 < args.length && !args[i+1].startsWith("-")) {

this.args.put(String.valueOf(arg), args[i+1]);

i++;

} else {

System.out.println(args[i] + " must follow a file");

return -1;

}

break;

}

} else {

// Target 文件

this.target.add(args[i]);

}

2.递归目录的询问

// 内容摘自 WordCount 的 buildTargetint buildTarget(String path) {

File dir = new File(path);

if (dir.exists()) {

File[] files = dir.listFiles();

for (File file:files) { // 遍历文件

if (file.isDirectory()) {

return buildTarget(file.getPath()); // 递归处理文件目录

}

if (file.isFile()) {

String name = file.getPath();

if (name.endsWith(this.exten)) {

this.target.add(name);

}

} else { // 错误处理

System.out.println("No such file or directory: "+path);

return -1;

}

return 0;

}

3.-a 参数的处理

public long[] countALines(String filename) {

long[] lines = {0,0,0}; // 分别表示代码行/空行/注释行

try {

// 因为是针对行，所以使用 BufferedReader

BufferedReader file = new BufferedReader(new FileReader(filename));

String line = file.readLine();

// 用于记录是否进入了注释，因为注释中，在注释中不存在代码字符

boolean intoComment = false;

while (line != null) {

long charCount = 0;

// 标志这一行是否包含路注释

boolean hasComment = false;

if (intoComment) hasComment = intoComment;

for (int i = 0; i < line.length(); i++) {

char ch = line.charAt(i);

// 跳过行首的空白符

if (ch == ' ' || ch == '\t' || ch == '\n') {

continue;

} // 匹配注释开头

else if (line.charAt(i) == '/' && intoComment == false) {

if (i + 1 != line.length()) {

// 匹配 /* 型注释

if (line.charAt(i+1) == '*') {

intoComment = true;

hasComment = true;

i++;

} // 匹配 // 型注释

else if (line.charAt(i+1) == '/' ) {

hasComment = true;

// 后面均为注释内容，不包含代码字符，所以可以提前结束这一行

break;

} else charCount++;

}

} // 匹配 */ 以结束注释块

else if (line.charAt(i) == '*' && intoComment == true) {

if (i + 1 != line.length()) {

if (line.charAt(i+1) == '/') {

intoComment = false;

i++;

}

} else charCount++;

}

else {

// 只有不在注释中才有有效字符

if (intoComment == false) {

charCount++;

}

// 根据是否有注释和有效字符来区分该行属于哪一类

if (charCount == 0) {

if (hasComment) {

lines[2]++;

} else {

lines[1]++;

}

} else if (charCount == 1) {

if (hasComment) {

lines[1]++;lines[2]++;

} else lines[1]++;

} else {

lines[0]++;

}

line = file.readLine();

}

} catch (IOException e) {

e.printStackTrace();

}

return lines;

}

4.-e 的实现

public void buildEscapeWord(String filename) {

this.escapeWord = new HashSet<String>();

try {

FileReader file = new FileReader(filename);

int ch = file.read();

while (ch != -1) {

String word = "";

// 跳过分隔符

while (ch != -1 && isSep((char)ch)) {

ch = file.read();

}

// 获取单词

while(ch != -1 && !isSep((char) ch)) {

word += String.valueOf((char)ch);

ch = file.read();

}

this.escapeWord.add(word);

if (ch == -1)break;

ch = file.read();

}

file.close();

} catch (IOException e) {

e.printStackTrace();

}

该函数读出所有要忽略的单词，并存在一个 HashSet 中。

在 countWords() 中作如下判断：

if (this.escapeWord != null && this.escapeWord.contains(word)) {

count--;

}

即可达到目的。

5.这个是递归遍历目录下所有文件的代码。最终结果是将所有文件放入allFile的ArrayList中，在main中进行处理。

public void getFilePath(ArrayList<String> allFile,String path ,String form){

File file=new File(path);

if(file.isDirectory()){//当前路径是文件夹

String[] filelist = file.list();

for (int i = 0; i < filelist.length; i++) {

File readfile = new File( path+ File.separator + filelist[i]);

if (!readfile.isDirectory()) {

if(form.contains("*")) {

//当含有*通配符时。

if (readfile.getName().substring(readfile.getName().lastIndexOf(".")).equals(form.substring(form.lastIndexOf(".")))) {

//获取文件后缀名，与form格式的后缀名比较，一致时加入。

allFile.add(path + File.separator + filelist[i]);

}

else {

if (readfile.getName().equals(form)) {

//否则只有文件名完全相同时，才add

allFile.add(path + File.separator + filelist[i]);

}

} else if (readfile.isDirectory()) {

//递归遍历所有文件

getFilePath(allFile,path + File.separator + filelist[i],form);

}

else {

allFile.add(path);

}

（6）测试设计

1. wc.exe -w -c -l test.java

(走无-o，无-e 无-a的路径，文件名用（默认）当前路径)

2. wc.exe -w -c D:\test\test.java

(同上，走无-l的路径，文件名用绝对路径)

3. wc.exe -w -s *.java -o ouput.txt

(同上，走“-s”路径，使用一般通配符，使用“-o”，声明输出文件)

4. wc.exe -w -s D:\test\*.java

(使用绝对路径和一般通配符)

5. wc.exe -c -l -w -a test.java -e ignore.txt

(测试功能-a，-e忽视词。此处应该注意要有-w，单词数的显示。)

6. wc.exe -c -l -s test.java -e ignore.txt

(走-s，-e路径，使用当前路径，不使用一般通配符)

7. wc.exe -l -a -s D:\test\*.java -o output.txt

(-a+ -s路径使用输出)

8. wc.exe -c -a test.java -o

(错误输入的反应。)

10. wc.exe -w -s test.jva -o output.txt

(输入文件的名称错误。显示错误信息)

11. wc.exe -e ignore.txt -w -s test.java

(-e指令在输入文件指令之前的情况)

12. wc.exe -o output.txt -w -l D:\test\test.java

(-o指令在输入文件的操作指令之前的情况)

13. wc.exe -w -l D:\test\*.java

(在没有遍历指令-s的情况下，使用一般通配符，会有什么样的结果。)

14. wc.exe -w -l -c -s -a D:\test\*.java -o output.txt -e ignore.txt

(测试全部功能)

15. wc.exe -e ignore.txt

（没有输入文件的情况）

16. wc.exe -e ignor.txt

(该文件不存在)

参考文献链接：

https://www.cnblogs.com/Berryxiong/p/6232373.html 关于Java split的使用

http://blog.csdn.net/sunling_sz/article/details/30476483 有关Java逐行读取文件

http://www.blogjava.net/baizhihui19870626/articles/372872.html 有关递归遍历目录下所有文件。

邹欣老师地址：http://www.cnblogs.com/xinz/p/7426280.html

北京航空航天大学罗杰老师地址：http://www.cnblogs.com/jiel/p/7545780.html

《编译原理》（狼书）

转载于:https://www.cnblogs.com/zhangyixuan/p/8609247.html