java 正则表达式 unicode,Java正则表达式支持Unicode?

To match A to Z, we will use regex:

[A-Za-z]

How to allow regex to match utf8 characters entered by user? For example Chinese words like 环保部

解决方案

What you are looking for are Unicode properties.

e.g. \p{L} is any kind of letter from any language

So a regex to match such a Chinese word could be something like

\p{L}+

There are many such properties, for more details see regular-expressions.info

Another option is to use the modifier

Pattern.UNICODE_CHARACTER_CLASS

In Java 7 there is a new property Pattern.UNICODE_CHARACTER_CLASS that enables the Unicode version of the predefined character classes see my answer here for some more details and links

You could do something like this

Pattern p = Pattern.compile("\\w+", Pattern.UNICODE_CHARACTER_CLASS);

and \w would match all letters and all digits from any languages (and of course some word combining characters like _).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值