统计文本中每个单词的个数

/**
	 * 统计文本每个单词的个数
	 * 
	 * @param text
	 *            文本
	 * @param ignoreCase
	 *            是否忽略大小写
	 * @return
	 */
	public static Map<String, Integer> countEachWorld(String text,
			boolean ignoreCase) {
		Matcher m = Pattern.compile("\\w+").matcher(text);
		String matcheStr = null;
		Map<String, Integer> map = new LinkedHashMap<>();
		Integer count = 0;
		while (m.find()) {
			matcheStr = m.group();
			matcheStr = ignoreCase ? matcheStr.toLowerCase() : matcheStr;
			count = map.get(matcheStr);
			map.put(matcheStr, count != null ? count + 1 : 1);
		}
		return map;
	}

匹配的文本:

Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.


结果:

1、忽略大小写

countEachWorld(text, true);

{java=3, provides=1, the=2, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, perl=1, programming=1, language=1, and=2, easy=1, learn=1, a=4, expression=1, is=1, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, they=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}

2、对大小写敏感

countEachWorld(text, false);

{Java=2, provides=1, the=2, java=1, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, Perl=1, programming=1, language=1, and=2, easy=1, learn=1, A=1, expression=1, is=1, a=3, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, They=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值