/**
* 统计文本每个单词的个数
*
* @param text
* 文本
* @param ignoreCase
* 是否忽略大小写
* @return
*/
public static Map<String, Integer> countEachWorld(String text,
boolean ignoreCase) {
Matcher m = Pattern.compile("\\w+").matcher(text);
String matcheStr = null;
Map<String, Integer> map = new LinkedHashMap<>();
Integer count = 0;
while (m.find()) {
matcheStr = m.group();
matcheStr = ignoreCase ? matcheStr.toLowerCase() : matcheStr;
count = map.get(matcheStr);
map.put(matcheStr, count != null ? count + 1 : 1);
}
return map;
}
匹配的文本:
Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
结果:
1、忽略大小写
countEachWorld(text, true);
{java=3, provides=1, the=2, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, perl=1, programming=1, language=1, and=2, easy=1, learn=1, a=4, expression=1, is=1, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, they=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}
2、对大小写敏感
countEachWorld(text, false);
{Java=2, provides=1, the=2, java=1, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, Perl=1, programming=1, language=1, and=2, easy=1, learn=1, A=1, expression=1, is=1, a=3, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, They=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}