Java正则表达式常见操作

最新推荐文章于 2022-05-29 19:16:00 发布

JackComeOn

最新推荐文章于 2022-05-29 19:16:00 发布

阅读量446

点赞数

分类专栏： JavaSE 文章标签： java

本文链接：https://blog.csdn.net/JackComeOn/article/details/85399358

版权

JavaSE 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

文章目录

String类中的相关函数
Pattern类和Matcher类
Pattern类
- 类方法
- 实例方法
Matcher类
常见操作
正则表达式语法

String类中的相关函数

boolean matches(String regex)
Tells whether or not this string matches the given regular expression.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
String replaceFirst(String regex, String replacement)
Replaces the first substring of this string that matches the given regular expression with the given replacement.
String[] split(String regex)
Splits this string around matches of the given regular expression.
String[] split(String regex, int limit)
Splits this string around matches of the given regular expression.

Pattern类和Matcher类

https://www.cnblogs.com/ggjucheng/p/3423731.html
java.util.regex是一个用正则表达式所订制的模式来对字符串进行匹配工作的类库包。它包括两个类：Pattern和Matcher。
Pattern 一个Pattern是一个正则表达式经编译后的表现模式。
Matcher 一个Matcher对象是一个状态机器，它依据Pattern对象做为匹配模式对字符串展开匹配检查。首先一个Pattern实例订制了一个所用语法与PERL的类似的正则表达式经编译后的模式，然后一个Matcher实例在这个给定的Pattern实例的模式控制下进行字符串的匹配工作。

Pattern类

https://docs.oracle.com/javase/8/docs/api/

类方法

static Pattern compile(String regex)
Compiles the given regular expression into a pattern.
static Pattern compile(String regex, int flags)
Compiles the given regular expression into a pattern with the given flags.
static boolean matches(String regex, CharSequence input)
Compiles the given regular expression and attempts to match the given input against it.
static String quote(String s)
Returns a literal pattern String for the specified String.

实例方法

String[] split(CharSequence input)
Splits the given input sequence around matches of this pattern.
String[] split(CharSequence input, int limit)
Splits the given input sequence around matches of this pattern.
Matcher matcher(CharSequence input)
Creates a matcher that will match the given input against this pattern.
String pattern()
Returns the regular expression from which this pattern was compiled.
String toString()
Returns the string representation of this pattern.
int flags()
Returns this pattern’s match flags.
Stream splitAsStream(CharSequence input)
Creates a stream from the given input sequence around matches of this pattern.
Predicate asPredicate()
Creates a predicate which can be used to match a string.

Matcher类

https://docs.oracle.com/javase/8/docs/api/
boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
boolean find(int start)
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
boolean matches()
Attempts to match the entire region against the pattern.

String group()
Returns the input subsequence matched by the previous match.
String group(int group)
Returns the input subsequence captured by the given group during the previous match operation.
String group(String name)
Returns the input subsequence captured by the given named-capturing group during the previous match operation.
int groupCount()
Returns the number of capturing groups in this matcher’s pattern

Pattern pattern()
Returns the pattern that is interpreted by this matcher.

String replaceAll(String replacement)
Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
String replaceFirst(String replacement)
Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.

常见操作

String input = "abc123de";
Pattern p = Pattern.compile("\\d");
Matcher m = p.matcher(input);
while(m.find()){
	System.out.println(m.group());
}

正则表达式语法

Summary of regular-expression constructs
Construct Matches

Characters
x The character x
\ The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\x{h…h} The character with hexadecimal value 0xh…h (Character.MIN_CODE_POINT <= 0xh…h <= Character.MAX_CODE_POINT)
\t The tab character (’\u0009’)
\n The newline (line feed) character (’\u000A’)
\r The carriage-return character (’\u000D’)
\f The form-feed character (’\u000C’)
\a The alert (bell) character (’\u0007’)
\e The escape character (’\u001B’)
\cx The control character corresponding to x

Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: a-lq-z

Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H A non-horizontal whitespace character: [^\h]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V A non-vertical whitespace character: [^\v]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{ASCII} All ASCII:[\x00-\x7F]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: One of !"#$%&’()*+,-./:;<=>?@[]^_`{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}\x20]
\p{Blank} A space or a tab: [ \t]
\p{Cntrl} A control character: [\x00-\x1F\x7F]
\p{XDigit} A hexadecimal digit: [0-9a-fA-F]
\p{Space} A whitespace character: [ \t\n\x0B\f\r]

java.lang.Character classes (simple java character type)
\p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()

Classes for Unicode scripts, blocks, categories and binary properties
\p{IsLatin} A Latin script character (script)
\p{InGreek} A character in the Greek block (block)
\p{Lu} An uppercase letter (category)
\p{IsAlphabetic} An alphabetic character (binary property)
\p{Sc} A currency symbol
\P{InGreek} Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)

Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input

Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times

Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times

Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times

Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group

Back references
\n Whatever the nth capturing group matched
\k Whatever the named-capturing group “name” matched

Quotation
\ Nothing, but quotes the following character
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting started by \Q

Special constructs (named-capturing and non-capturing)
(?X) X, as a named-capturing group
(?:X) X, as a non-capturing group
(?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group