JAVA正则表达式

最新推荐文章于 2023-06-28 00:04:45 发布

coolibin

最新推荐文章于 2023-06-28 00:04:45 发布

阅读量1.1k

点赞数

分类专栏： Java 文章标签： java 正则表达式 android

本文链接：https://blog.csdn.net/libinjlu/article/details/23875201

版权

Java 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1. 一般来说，正则表达式就是以某种方式来描述字符串。

在其他语言中，\\表示“我想要在正则表达式中插入一个普通的（字面上的）反斜线，请不要给它任何特殊的意义。”而在Java中，\\的意思是“我要插入一个正则表达式的反斜线，所以其后的字符具有特殊的意义。”例如，如果你想表示一位数字，那么正则表达式应该是\\d。如果你想插入一个普通的反斜线，则应该这样\\\\。不过换行和制表符之类的东西只需使用单反斜线：\n\t。

？表示可能有某个字符。如-？表示可能有一个负号在前面。

+表示一个或多个之前的表达式。

2. String类自带正则表达式工具：

1）matches , 检查string是否匹配正则表达式

"-1234".matches("-?\\d+"); //true

2) split ，将字符串从正则表达式匹配的地方切开（匹配的部分被删除）

"you must do it".split("\\W+");//you, must, do, it

3) replaceFirst ，replaceAll ，替换

"you found it".replaceFirst("f\\w+","located"); //"you located it"
"you found it".replaceAll("f\\w+","located"); //"you located it"

3. 创建正则表达式(导入java.util.regex)
在JDK文档的java.util.regex.Pattern那一页有完整的下述表达式。

Construct	Matches

Characters
x	The character x
`\\`	The backslash character
`\0`n	The character with octal value `0`n (0 `<=` n `<=` 7)
`\0`nn	The character with octal value `0`nn (0 `<=` n `<=` 7)
`\0`mnn	The character with octal value `0`mnn (0 `<=` m `<=` 3, 0 `<=` n `<=` 7)
`\x`hh	The character with hexadecimal value `0x`hh
`\u`hhhh	The character with hexadecimal value `0x`hhhh
`\x`{h...h}	The character with hexadecimal value `0x`h...h (`Character.MIN_CODE_POINT` <= `0x`h...h <= `Character.MAX_CODE_POINT`)
`\t`	The tab character (`'\u0009'`)
`\n`	The newline (line feed) character (`'\u000A'`)
`\r`	The carriage-return character (`'\u000D'`)
`\f`	The form-feed character (`'\u000C'`)
`\a`	The alert (bell) character (`'\u0007'`)
`\e`	The escape character (`'\u001B'`)
`\c`x	The control character corresponding to x

Character classes
`[abc]`	`a`, `b`, or `c` (simple class)
`[^abc]`	Any character except `a`, `b`, or `c` (negation)
`[a-zA-Z]`	`a` through `z` or `A` through`Z`, inclusive (range)
`[a-d[m-p]]`	`a` through `d`, or `m` through`p`:`[a-dm-p]` (union)
`[a-z&&[def]]`	`d`, `e`, or `f` (intersection)
`[a-z&&[^bc]]`	`a` through `z`, except for `b` and`c`:`[ad-z]` (subtraction)
`[a-z&&[^m-p]]`	`a` through `z`, and not `m` through`p`:`[a-lq-z]`(subtraction)

Predefined character classes
`.`	Any character (may or may not match line terminators)
`\d`	A digit: `[0-9]`
`\D`	A non-digit: `[^0-9]`
`\h`	A horizontal whitespace character: `[ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]`
`\H`	A non-horizontal whitespace character: `[^\h]`
`\s`	A whitespace character: `[ \t\n\x0B\f\r]`
`\S`	A non-whitespace character: `[^\s]`
`\v`	A vertical whitespace character: `[\n\x0B\f\r\x85\u2028\u2029]`
`\V`	A non-vertical whitespace character: `[^\v]`
`\w`	A word character: `[a-zA-Z_0-9]`
`\W`	A non-word character: `[^\w]`

4. 量词
量词描述了一个模式吸收输入文本的方式：
1）贪婪型：为所有可能的模式发现尽可能多的匹配。
2）勉强型：用问号来指定，匹配满足模式所需的最少字符数
3）占有型。
5. CharSequence
接口CharSequence从CharBuffer、String、StringBuffer、StringBuilder类之中抽象出了字符序列的一般化定义：

interface CharSequence {
    charAt(int i);
    length();
    subSequence(int start, int end);
    toString();
}

6. Pattern和Matcher

Pattern p=Pattern.compile("abc+");
Matcher m=p.matcher("abcabcac");
while(m.find){
    println("Match \""+m.group()+"\" at positions "+m.start()+"-"+(m.end()-1));
    //Match "abc" at positions 0-2
    //Match "abc" at positions 3-5
}

Matcher还有matches，
lookingAt（判断字符串的始部分能否匹配模式），
start（返回先前匹配的起始位置的索引），
end返回所匹配的最后字符的索引加一的值，
split，replaceFirst，replaceAll，
appendReplacement进行渐进式的替换（即对每一匹配的字符串进行逐个的替换，例如对匹配的元音字母转换成对应的大写形式），
appendTail(StringBuffer sbuf)在执行了一次或多次appendReplacement之后调用此方法可以将输入字符串余下的部分复制到sbuf中，
reset可以将现有的Matcher对象应用于一个新的字符序列。