第4周小组作业：WordCount优化

最新推荐文章于 2021-03-16 07:33:41 发布

weixin_30271335

最新推荐文章于 2021-03-16 07:33:41 发布

阅读量128

点赞数

文章标签： java 测试

原文链接：http://www.cnblogs.com/jakejian/p/8735660.html

版权

Github项目地址：https://github.com/chaseMengdi/wcPro

stage1:代码编写+单元测试

PSP表格

PSP2.1	PSP阶段	预估耗时(分钟)	实际耗时(分钟)
Planning	计划	25	30
Estimate	估计任务需要多少时间	25	30
Development	开发	300	302
Analysis	需求分析	20	20
Design Spec	生成设计文档	20	15
Design Review	设计复审	20	15
Coding Standard	代码规范	20	15
Design	具体设计	20	25
Coding	具体编码	90	80
Code Review	代码复审	20	30
Test	测试	60	80
Reporting	报告	80	95
Test Report	测试报告	30	50
Size Measurement	计算工作量	30	25
Postmortem	总结	20	20
	合计	405	430

描述代码设计思路

接口设计

public static HashMap<String, Integer> count(String thefile)

划分统计单词数

接口实现

count()函数传入的是一个文件名，即txt文件名，通过逐行读取文件，先将字符串转换为小写，通过split()函数对字符串进行划分。个人技术有限，发现当字符串的最前面的字符是非字母的时候，split()划分会出现不知名的空，故在使用split()进行划分前需要先去掉字符串最前面的非字母字符，同时还有注意“-”出现的情况，如—word-word---，---，需要去开头的“-”和结尾的“-”，而保留词与词间的“-”。然后进行单词统计，将结果存放在HashMap<String, Integer>中。

// 划分统计单词数

       public static HashMap<String, Integer> count(String thefile) {

              File file = new File(thefile);

              HashMap<String, Integer> map = new HashMap<>();

              if (file.exists()) {

                     try {

                            FileInputStream fis = new FileInputStream(file);

                            InputStreamReader isr = new InputStreamReader(fis, "UTF-8");

                            BufferedReader br = new BufferedReader(isr);

                            String line = new String("");

                            StringBuffer sb = new StringBuffer();

                            while ((line = br.readLine()) != null) {

                                   // 转为小写

                                   line = line.toLowerCase();

                                   int k = 0;

                                   // 去除行首的非字母单词

                                   char first = line.charAt(k);

                                   while (!((first >= 'a' && first <= 'z') || first == '-')) {

                                          k++;

                                          first = line.charAt(k);

                                   }

                                   line = line.substring(k);

                                   // 去除多个空格\\s+

                                   String[] split = line

                                                 .split("\\s++|0|1|2|3|4|5|6|7|8|9|\\_|\\'|\\.|\\,|\\;|\\(|\\)|\\~|\\!|"

                                                               + "\\@|\\#|\\$|\\%|\\&|\\*|\\?|\""

                                                               + "|\\[|\\]|\\<|\\>|\\=|\\+|\\*|\\/|\\{|\\}|\\:|\\||\\^|\\`");

                                   for (int i = 0; i < split.length; i++) {

                                          // 获取到每一个单词

                                          Integer integer = map.get(split[i]);

                                          // 考虑末尾为-的单词或开头为---

                                          if ((split[i].endsWith("-") || split[i].startsWith("-"))

                                                        && !(split[i].equals("-"))) {

                                                 // 去除多个空格\\s+

                                                 String[] sp = split[i].split("\\s++|\\-");

                                                 // 全部为----

                                                 if (sp.length == 0) {

                                                        split[i] = "-";

                                                        integer = map.get(split[i]);

                                                 }

                                                 // 处理--dan

                                                 else if (split[i].startsWith("-")) {

                                                        int j = 0;

                                                        char si = split[i].charAt(0);

                                                        while (split[i].charAt(j) == si)

                                                               j++;

                                                        split[i] = split[i].substring(j);

                                                        integer = map.get(split[i]);

                                                 }

                                                 // 去除多个空格\\s+

                                                 sp = split[i].split("\\s+|\\-");

                                                 // 全部为----

                                                 if (sp.length == 0) {

                                                        split[i] = "-";

                                                        integer = map.get(split[i]);

                                                 }

                                                 // 处理dn-dan---

                                                 else {

                                                        String tmp = sp[0];

                                                        for (int j = 1; j < sp.length; j++) {

                                                               tmp = tmp + "-" + sp[j];

                                                        }

                                                        split[i] = tmp;

                                                        integer = map.get(split[i]);

                                                 }

                                          }

                                          if (!split[i].equals("") && !split[i].equals("-")) {

                                                 // 如果这个单词在map中没有，赋值1

                                                 if (null == integer) {

                                                        map.put(split[i], 1);

                                                 } else {

                                                        // 如果有，在原来的个数上加上一

                                                        map.put(split[i], ++integer);

                                                 }

                                          }

                                   }

                            }

                            sb.append(line);

                            br.close();

                            isr.close();

                            fis.close();

                     } catch (FileNotFoundException e) {

                            e.printStackTrace();

                     } catch (UnsupportedEncodingException e) {

                            e.printStackTrace();

                     } catch (IOException e) {

                            e.printStackTrace();

                     }

              } else {

                     System.out.print("文件不存在\n");

              }

              return map;

       }

测试设计过程

count()函数的测试设计应以黑盒设计为主，首先创建并初始化一个HashMap<String, Integer>和txt文件，随后将txt文件名传入count()，将实际输出与期望输出利用断言进行对比。

20个测试用例设计如下：

Test Case ID 测试用例编号	Test Item 测试项（即功能模块或函数）	Test Case Title 测试用例标题	Test Criticality 重要级别	Pre-condition 预置条件	Input 输入	Output 预期结果 Result 实际结果	Status 是否通过	Remark 备注
count_01	划分统计单词数	一个单词，小写	L	txt文件存在	01.txt	单词与词频	OK	黑盒测试
count_02	划分统计单词数	多个单词，小写	L	txt文件存在	02.txt	单词与词频	OK	黑盒测试
count_03	划分统计单词数	一个单词，大小写混合	L	txt文件存在	03.txt	单词与词频	OK	黑盒测试
count_04	划分统计单词数	多个单词，大小写混合	L	txt文件存在	04.txt	单词与词频	OK	黑盒测试
count_05	划分统计单词数	一个单词，大小写+连字符混合	L	txt文件存在	05.txt	单词与词频	OK	黑盒测试
count_06	划分统计单词数	多个单词，大小写+连字符混合	L	txt文件存在	06.txt	单词与词频	OK	黑盒测试
count_07	划分统计单词数	一个单词，大小写+连字符（任意位置）混合	L	txt文件存在	07.txt	单词与词频	OK	黑盒测试
count_08	划分统计单词数	多个单词，大小写+连字符（任意位置）混合	L	txt文件存在	08.txt	单词与词频	OK	黑盒测试
count_09	划分统计单词数	一个单词，大小写+单引号混合	L	txt文件存在	09.txt	单词与词频	OK	黑盒测试
count_10	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号混合	L	txt文件存在	10.txt	单词与词频	OK	黑盒测试
count_11	划分统计单词数	一个单词，大小写+连字符+双引号混合	L	txt文件存在	11.txt	单词与词频	OK	黑盒测试
count_12	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号+双引号混合	L	txt文件存在	12.txt	单词与词频	OK	黑盒测试
count_13	划分统计单词数	一个单词，大小写+连字符（任意位置）+双引号+数字混合	M	txt文件存在	13.txt	单词与词频	OK	黑盒测试
count_14	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号+双引号+数字混合	M	txt文件存在	14.txt	单词与词频	OK	黑盒测试
count_15	划分统计单词数	一个单词，大小写+连字符（任意位置）+双引号+数字（任意位置）混合	M	txt文件存在	15.txt	单词与词频	OK	黑盒测试
count_16	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号+双引号+数字（任意位置）混合	M	txt文件存在	16.txt	单词与词频	OK	黑盒测试
count_17	划分统计单词数	一个单词，大小写+连字符（任意位置）+双引号+数字（任意位置）+常见符号混合	M	txt文件存在	17.txt	单词与词频	OK	黑盒测试
count_18	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号+双引号+数字（任意位置）+常见符号混合	H	txt文件存在	18.txt	单词与词频	OK	黑盒测试
count_19	划分统计单词数	一个单词，大小写+连字符（任意位置）+双引号+数字（任意位置）+常见符号混合	H	txt文件存在	19.txt	单词与词频	OK	黑盒测试
count_20	划分统计单词数	多个单词，大小写+连字符（任意位置）+单引号+双引号+数字（任意位置）+常见符号+词频相同混合	H	txt文件存在	20.txt	单词与词频	OK	黑盒测试

测试运行和评价

count(String thefile)的单元测试的数据量和复杂度逐渐增加，符合测试用例设计规范，所有单元测试均通过。

count(String thefile)函数质量水平较高，可以正常的划分字符串和统计单词。

小组贡献

我负责的是划分统计单词数，这是本程序流程的第二步，如果出现错误，将会引起连锁反应致使本程序不能正确运行，所以这部分的正确性是很重要。

刚开始用split()对字符串进行分割，但后来发现如果字符串的开始部分是分割字符将会导致分割到的第一个字符串为空（不知道为什么），故本人加入一个去除以非字母开始的字符直到出现第一个字母。由于出现“-”的单词有几种情况，还需要对它们进行判断处理。

我个人的代码行数占了0.51，主要是要求有点多，需要考虑的情况较多，个人觉得应该还是完成的挺不错的.

stage2:静态测试

开发规范理解

《阿里巴巴Java开发手册》中指出：

2. 【强制】代码中的命名严禁使用拼音与英文混合的方式，更不允许直接使用中文的方式。说明：正确的英文拼写和语法可以让阅读者易于理解，避免歧义。注意，即使纯拼音命名方式也要避免采用。
正例：alibaba / taobao / youku / hangzhou 等国际通用的名称，可视同英文。

反例：DaZhePromotion [打折] / getPingfenByName() [评分] / int 某变量 = 3

理解：

我觉得这点规定非常好，因为正确的英文拼写和语法可以让阅读者易于理解，避免歧义。而拼音英文混合和中文命名代码的方式则不利于读者阅读代码，也不利于性能优化和同行评审。我负责的单词统计模块代码命名均为国机通用的英文，符合《阿里巴巴Java开发手册》第二条的强制规定。

组员代码评价

选择刘博谦（17070）的代码进行分析

// 词频排序
    public static ArrayList<String> sort(HashMap<String, Integer> map) {
        // 以Key进行排序
        TreeMap treemap = new TreeMap(map);
        // 以value进行排序
        ArrayList<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(
                treemap.entrySet());
        Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
            public int compare(Map.Entry<String, Integer> o1,
                    Map.Entry<String, Integer> o2) {
                // 降序
                return o2.getValue() - o1.getValue();
                // 升序 o1.getValue() - o2.getValue()）
            }
        });
        ArrayList<String> str = new ArrayList<String>();
        int i = 0;
        for (Map.Entry<String, Integer> string : list) {
            // 排除-与空格
            if (!(string.getKey().equals("")) && !(string.getKey().equals("-"))) {
                str.add(string.getKey());
                str.add(string.getValue().toString());
                // 输出前1000个单词
                if (i > 1000)
                    break;
                i++;
            }
        }
        return str;
    }

刘博谦的代码遵守了《阿里巴巴Java开发手册》第二条的强制规定，代码命名规范，无需改进。

静态代码检查

选择工具：FindBugs 3.0.1

下载链接：http://findbugs.sourceforge.net/

检查结果如下：

String line = new String("");

缺陷信息：

Method invokes inefficient new String(String) constructor

Using the java.lang.String(String) constructor wastes memory because the object so constructed will be functionally indistinguishable from the String passed as a parameter. Just use the argument String directly.

Bug kind and pattern: Dm - DM_STRING_CTOR

分析：new String("")构造函数效率低，直接line=""即可。

个人代码改进

个人代码符合《阿里巴巴Java开发手册》第二条的强制规定，改正FindBugs指出的缺陷后，代码如下。

(完整代码过于繁琐，下面只贴出缺陷部分代码修改后的结果)

                            FileInputStream fis = new FileInputStream(file);

                            InputStreamReader isr = new InputStreamReader(fis, "UTF-8");

                            BufferedReader br = new BufferedReader(isr);

                            String line = "";

                            StringBuffer sb = new StringBuffer();

小组代码分析

（此部分由本人和刘博谦（17070）完成）

1、FileWriter writer = new FileWriter("result.txt", true);

写文件流可能关闭异常，应该使用try/finally来确保写文件流会被成功关闭。

原有代码中使用了try/catch，但是并未使用finally语句，这就导致如果出现错误跳到catch，程序继续执行的话，写文件流一直会被占用，从而可能引发程序崩溃。

建议添加finally块来关闭写文件流。

2、FileWriter writer = new FileWriter("result.txt", true);

此行语句需要依赖默认编码来正常工作，为防止隐藏bug，应该指定一个编码，考虑到程序需求，指定UTF-8编码。

3、String line = new String("");

new String("")构造函数效率低，直接line=""即可。

4、message += (str.get(i) + " " + str.get(i + 1) + "\r\n");

循环中使用+来连接字符串，时间开销为二次方，建议修改使用StringBUffer.append（String）方法来提高效率。

参考资料

1、单词词频统计降序排序（代码贴）

2、阿里巴巴Java开发手册

转载于:https://www.cnblogs.com/jakejian/p/8735660.html

weixin_30271335

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第4周小组作业：WordCount优化

Github项目地址：https://github.com/chaseMengdi/wcProstage1:代码编写+单元测试PSP表格PSP2.1PSP阶段预估耗时(分钟)实际耗时(分钟)Planning计划2530Estimate...
复制链接

扫一扫