java英文词频统计_英文词频统计的java实现方法

该博客介绍了如何使用Java进行英文单词频率统计,包括读取文件、统计单词出现次数、排序并显示结果。通过BufferedReader读取文件,使用HashMap存储单词及其出现次数,然后将Map转换为List排序,最后输出排序后的结果。
摘要由CSDN通过智能技术生成

需求概要

1.读取文件,文件内包可含英文字符,及常见标点,空格级换行符。

2.统计英文单词在本文件的出现次数

3.将统计结果排序

4.显示排序结果

分析

1.读取文件可使用BufferedReader类按行读取

2.针对读入行根据分隔符拆分出单词,使用java.util工具提供的Map记录单词和其出现次数的信息,HashMap和TreeMap均可,如果排序结果按字母序可选用TreeMap,本例选择用HashMap。

3.将Map中存储的键值对,存入List中,List中的元素为二维字符串数组,并将单词出现次数作为比对依据,利用list提供的sort方法排序,需要实现Comparator.compare接口方法。

4.循环遍历List中的String[],输出结果。

部分功能实现

输出结果

1 public voidprintSortedWordGroupCount(String filename) {2   List result =getSortedWordGroupCount(filename);3   if (result == null) {4     System.out.println("no result");5   return;6 }7   for(String[] sa : result) {8     System.out.println(sa[1] + ": " + sa[0]);9 }

10 }

统计

1 public MapgetWordGroupCount(String filename) {2   try{3   FileReader fr = newFileReader(filename);4   BufferedReader br = newBufferedReader(fr);5   String content = "";6   Map result = new HashMap();7   while ((content = br.readLine()) != null) {8     StringTokenizer st = new StringTokenizer(content, "!&(){}+-= ':;<> /\",");9     while(st.hasMoreTokens()) {10     String key =st.nextToken();11     if(result.containsKey(key))12       result.put(key, result.get(key) + 1);13     else

14       result.put(key, 1);15 }16 }17 br.close();18 fr.close();19   returnresult;20

21   } catch(FileNotFoundException e) {22     System.out.println("failed to open file:" +filename);23 e.printStackTrace();24   } catch(Exception e) {25     System.out.println("some expection occured");26 e.printStackTrace();27 }28     return null;29 }

排序

1 public ListgetSortedWordGroupCount(String filename) {2 Map result =getWordGroupCount(filename);3 if (result == null)4 return null;5 List list = new LinkedList();6 Set keys =result.keySet();7 Iterator iter =keys.iterator();8 while(iter.hasNext()) {9 String key =iter.next();10 String[] item = new String[2];11 item[1] =key;12 item[0] = "" +result.get(key);13 list.add(item);14 }15 list.sort(new Comparator() {16 public intcompare(String[] s1, String[] s2) {17 return Integer.parseInt(s2[0])-Integer.parseInt(s1[0]);18 }19 });20 returnlist;21 }

针对本程序的简易测试用例自动生成

使用随机数产生随机单词,空格,并在随机位置插入回车

1 public boolean gernerateWordList(String filePosition, longwordNum) {2   FileWriter fw = null;3   StringBuffer sb = new StringBuffer("");4   try{5     fw = newFileWriter(filePosition);6

7     for (int j = 0; j < wordNum; j++) {8     int length = (int) (Math.random() * 10 + 1);9     for (int i = 0; i < length; i++) {10       char ch = (char) ('a' + (int) (Math.random() * 26));11 sb.append(ch);12 }13 fw.write(sb.toString());14     fw.write(" ");15     sb = new StringBuffer("");16     if (wordNum % (int) (Math.random() * 8 + 4) == 0)17       fw.write("\n");18 }19   } catch(IOException e) {20 e.printStackTrace();21     return false;22   } finally{23     try{24 fw.close();25     } catch(IOException e) {26 e.printStackTrace();27       System.out.println("failed to close file");28       return false;29 }30 }31    return false;32 }33 }

实际用例结果

源自维基百科specification词条节选

"Specification" redirects here. For other uses, see Specification (disambiguation).

There are different types of specifications, which generally are mostly types of documents, forms or orders or relates to information in databases. The word specification is defined as "to state explicitly or in detail" or "to be specific". A specification may refer to a type of technical standard (the main topic of this page).

Using a word "specification" without additional information to what kind of specification you refer to is confusing and considered bad practice within systems engineering.

A requirement specification is a set of documented requirements to be satisfied by a material, design, product, or service.[1]

A functional specification is closely related to the requirement specification and may show functional block diagrams.

A design or product specification describes the features of the solutions for the Requirement Specification, referring to the designed solution or final produced solution. Sometimes the term specification is here used in connection with a data sheet (or spec sheet). This may be confusing. A data sheet describes the technical characteristics of an item or product as designed and/or produced. It can be published by a manufacturer to help people choose products or to help use the products. A data sheet is not a technical specification as described in this article.

A "in-service" or "maintained as" specification, specifies the conditions of a system or object after years of operation, including the effects of wear and maintenance (configuration changes).

Specifications may also refer to technical standards, which may be developed by any of various kinds of organizations, both public and private. Example organization types include a corporation, a consortium (a small group of corporations), a trade association (an industry-wide group of corporations), a national government (including its military, regulatory agencies, and national laboratories and institutes), a professional association (society), a purpose-made standards organization such as ISO, or vendor-neutral developed generic requirements. It is common for one organization to refer to (reference, call out, cite) the standards of another. Voluntary standards may become mandatory if adopted by a government or business contract.

结果截取

a: 16

of: 16

or: 15

to: 14

the: 12

specification: 11

is: 7

A: 7

and: 7

may: 6

in: 5

.: 5

as: 5

be: 5

standards: 4

by: 4

technical: 4

sheet: 4

refer: 4

product: 3

Specification: 3

organization: 3

data: 3

types: 3

an: 2

word: 2

functional: 2

association: 2

It: 2

government: 2

are: 2

national: 2

describes: 2

help: 2

information: 2

developed: 2

group: 2

which: 2

including: 2

this: 2

corporations: 2

for: 2

design: 2

designed: 2

requirement: 2

mostly: 1

practice: 1

bad: 1

products.: 1

considered: 1

type: 1

without: 1

years: 1

professional: 1

可进行的拓展

建立重载函数,添加参数,可以根据用户需要倒序排列统计结果,即按照单词出现次数从少到多排序。

git@git.coding.net:jx8zjs/wordCount.git

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是Java实现英文词频统计的完整源码: ```java import java.io.BufferedReader; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; import java.util.HashMap; import java.util.Map; public class WordCount { public static void main(String[] args) { String filePath = "input.txt"; //输入文件路径 Map<String, Integer> wordMap = new HashMap<String, Integer>(); try { FileInputStream inputStream = new FileInputStream(filePath); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); String line; while ((line = bufferedReader.readLine()) != null) { String[] words = line.split("[^a-zA-Z]+"); for (String word : words) { if (word.length() == 0) { continue; } if (wordMap.containsKey(word)) { wordMap.put(word, wordMap.get(word) + 1); } else { wordMap.put(word, 1); } } } bufferedReader.close(); inputStream.close(); } catch (IOException e) { e.printStackTrace(); } System.out.println("Word Count:"); for (Map.Entry<String, Integer> entry : wordMap.entrySet()) { System.out.println(entry.getKey() + ": " + entry.getValue()); } } } ``` 这段代码实现了一个基本的英文词频统计程序,其中输入文件路径为`input.txt`,程序通过读取文件中的每一行,将其中的单词提取出来,并统计每个单词出现的次数。统计完成后,程序将结果输出到控制台上。 在实现中,我们使用了`HashMap`来存储单词和出现次数的对应关系,使用了正则表达式`[^a-zA-Z]+`来将每行文本拆分成单词。最后,我们遍历`HashMap`,输出结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值