- * 使用正则表达式来计算一段文本当中所有以小写字母开头的
- * 单词的出现次数
- */
- public void testRegex(){
- String POEM =
- "Towas brillig, and the slithy toves/n" +
- "Did gyre and Gimble in the wabe./n" +
- "All mimsy were the borogoves,/n" +
- "And the mome raths outgrabe./n/n" +
- "Beware the Jabberwock, my son,/n" +
- "The jaws that bite, the claws that catch./n" +
- "Beware the Jubjub bird, and shun/n" +
- "The frumious Bandersnatch.";
- Map<String, Integer> wordCount = new HashMap<String, Integer>();
- b用来指定单词的边界,这里在单词的开头和结尾都使用了//b。用来
- //区分各个单词。中间的//w+指明一个活着多个单词字符(word character)
- Matcher m = Pattern.compile("//b([a-z]//w+[a-zA-Z]){1}//b").matcher(POEM);
- while(m.find()) {
- if(wordCount.containsKey(m.group(0))){
- Integer count = wordCount.get(m.group(0));
- wordCount.put(m.group(0), count+1);
- }else{
- wordCount.put(m.group(0), 1);
- }
- }
- System.out.println(wordCount.toString());
- }