了解Java:Pattern及正则使用

1.简单了解

Regular Expressions

字符串处理:字符匹配、查找、替换

2.JDK支持

相关jdk类:java.lang.String,java.util.regex.Pattern,java.util.regex.Matcher

 

Summary of regular-expression constructs

ConstructMatches
 
Characters
xThe character x
\\The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\tThe tab character ('\u0009')
\nThe newline (line feed) character ('\u000A')
\rThe carriage-return character ('\u000D')
\fThe form-feed character ('\u000C')
\aThe alert (bell) character ('\u0007')
\eThe escape character ('\u001B')
\cx The control character corresponding to x
 
Character classes
[abc] a, b, or c (simple class)
[^abc]Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
 
Predefined character classes
.Any character (may or may not match line terminators)
\dA digit: [0-9]
\DA non-digit: [^0-9]
\sA whitespace character: [ \t\n\x0B\f\r]
\SA non-whitespace character: [^\s]
\wA word character: [a-zA-Z_0-9]  单词字符
\WA non-word character: [^\w]
 
POSIX character classes (US-ASCII only)
\p{Lower}A lower-case alphabetic character: [a-z]
\p{Upper}An upper-case alphabetic character:[A-Z]
\p{ASCII}All ASCII:[\x00-\x7F]
\p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}A decimal digit: [0-9]
\p{Alnum}An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}A visible character: [\p{Alnum}\p{Punct}]
\p{Print}A printable character: [\p{Graph}\x20]
\p{Blank}A space or a tab: [ \t]
\p{Cntrl}A control character: [\x00-\x1F\x7F]
\p{XDigit}A hexadecimal digit: [0-9a-fA-F]
\p{Space}A whitespace character: [ \t\n\x0B\f\r]
 
java.lang.Character classes (simple java character type)
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
 
Classes for Unicode blocks and categories
\p{InGreek}A character in the Greek block (simple block)
\p{Lu}An uppercase letter (simple category)
\p{Sc}A currency symbol
\P{InGreek}Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)
 
Boundary matchers
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input
 
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
 
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
 
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
 
Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
 
Back references
\n Whatever the nth capturing group matched
 
Quotation
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q
 
Special constructs (non-capturing)
(?:X) X, as a non-capturing group
(?idmsux-idmsux) Nothing, but turns match flags i d m s u x on - off
(?idmsux-idmsux:X)   X, as a non-capturing group with the given flags i d m s u x on - off
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group

 

greedy quantifiers and reluctant quantifiers and possessive quantifiers

贪婪量词,饥饿量词、占有量词,多加了?号和+号,表示的意思是一样的。匹配不同

 

A greedy quantifier starts by looking at the entire string for a match. If no match is found, it eliminates
the last character in the string and tries again. If a match is still not found, the last character is again 
discarded and the process repeats until a match is found or the string is left with no characters. All the
quantifiers discussed to this point have been greedy.


A reluctant quantifier starts by looking at the first character in the string for a match. If that character
alone isn’t enough, it reads in the next character, forming a string of two characters. If still no match isfound, a reluctant quantifier continues to add characters from the string until either a match is found or
the entire string is checked without a match. Reluctant quantifiers work in reverse of greedy quantifiers.
A Possessive quantifier only tries to match against the entire string. If the entire string doesn’t produce a match, no further attempt is made. Possessive quantifiers are, in a manner of speaking, a one-shot deal.

贪婪量词之所以称之为"贪婪的",是由于它们强迫匹配器读入(或者称之为吃掉)整个输入的字符串,来优先尝试第一次匹配,如果第一次尝试匹配(对整个输入的字符串)失败,匹配器会通过回退整个字符串的一个字符再一次进行尝试,不断的进行处理直到找到一个匹配,或者左边没有更多的字符用来回退了。赖于在表达式中使用的量词,最终它将尝试地靠着1或0个字符的匹配。

 

但是,勉强量词采用相反的路径:从输入字符串的开始处开始,因此每次勉强地吞噬一个字符来寻找匹配,最终它们尝试整个输入的字符串。

 

最后,侵占量词始终是吞掉整个输入的字符串,尝试着一次(仅有一次)匹配。不像贪婪量词那样,侵占量词绝不会回退,即使这样是允许全部的匹配成功。

 

 

 

Matcher类: 
    使用Matcher类,最重要的一个概念必须清楚:组(Group),在正则表达式中 ()定义了一个组,由于一个正则表达式可以包含很多的组,所以下面先说说怎么划分组的, 以及这些组和组的下标怎么对应的. 

下面我们看看一个小例子,来说明这个问题 

\w(\d\d)(\w+)

这个正则表达式有三个组: 
整个\w(\d\d)(\w+) 是第0组 group(0) 
(\d\d)是第1组 group(1) 
(\w+)是第2组 group(2) 

   我们看看和正则表达式匹配的一个字符串x99SuperJava, 
group(0)是匹配整个表达式的字符串的那部分x99SuperJava 
group(1)是第1组(\d\d)匹配的部分:99 
group(2)是第二组(\w+)匹配的那部分SuperJava
<iframe id="aswift_1" style="position: absolute; top: 0px; left: 0px;" name="aswift_1" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="336" height="280"></iframe>

   下面我们写一个程序来验证一下: 

package edu.jlu.fuliang;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
	public static void main(String[] args) {
		String regex = "\\w(\\d\\d)(\\w+)";
		String candidate = "x99SuperJava";
		
		Pattern p = Pattern.compile(regex);
		Matcher matcher = p.matcher(candidate);
		if(matcher.find()){
			int gc = matcher.groupCount();
			for(int i = 0; i <= gc; i++)
				System.out.println("group " + i + " :" + matcher.group(i));
		}
	}
}
                  

输出结果: 

引用
group 099SuperJava  
group 1 :99  
group 2 :SuperJava



下面我们看看Matcher类提供的方法: 
public Pattern pattern() 
这个方法返回了,创建Matcher的那个pattern对象。 

下面我们看看一个小例子来说明这个结果 

import java.util.regex.*;
public class MatcherPatternExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     Pattern p = Pattern.compile("\\d");
     Matcher m1 = p.matcher("55");
     Matcher m2 = p.matcher("fdshfdgdfh");
     System.out.println(m1.pattern() == m2.pattern());
     //return true
  }
}
   

public Matcher reset() 
这个方法将Matcher的状态重新设置为最初的状态。 

public Matcher reset(CharSequence input) 
重新设置Matcher的状态,并且将候选字符序列设置为input后进行Matcher, 这个方法和重新创建一个Matcher一样,只是这样可以重用以前的对象。 

public int start() 
这个方法返回了,Matcher所匹配的字符串在整个字符串的的开始下标: 
下面我们看看一个小例子 

public class MatcherStartExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Matcher and use the Matcher.start() method
     String candidateString = "My name is Bond. James Bond.";
     String matchHelper[] =
      {"          ^","                      ^"};
     Pattern p = Pattern.compile("Bond");
     Matcher matcher = p.matcher(candidateString);
     //Find the starting point of the first 'Bond'
      matcher.find();
      int startIndex = matcher.start();
      System.out.println(candidateString);
      System.out.println(matchHelper[0] + startIndex);
     //Find the starting point of the second 'Bond'
      matcher.find();
      int nextIndex = matcher.start();
      System.out.println(candidateString);
      System.out.println(matchHelper[1] + nextIndex);
}
                  


输出结果: 
My name is Bond. James Bond. 
          ^11 
My name is Bond. James Bond. 
                      ^23 

public int start(int group) 
这个方法可以指定你感兴趣的sub group,然后返回sup group匹配的开始位置。 

public int end() 
这个和start()对应,返回在以前的匹配操作期间,由给定组所捕获子序列的最后字符之后的偏移量。 
其实start和end经常是一起配合使用来返回匹配的子字符串。 

public int end(int group) 
和public int start(int group)对应,返回在sup group匹配的子字符串最后一个字符在整个字符串下标加一 

public String group() 
返回由以前匹配操作所匹配的输入子序列。 
这个方法提供了强大而方便的工具,他可以等同使用start和end,然后对字符串作substring(start,end)操作。 
看看下面一个小例子: 

import java.util.regex.*;
public class MatcherGroupExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
      //create a Pattern
      Pattern p = Pattern.compile("Bond");
      //create a Matcher and use the Matcher.group() method
      String candidateString = "My name is Bond. James Bond.";
      Matcher matcher = p.matcher(candidateString);
      //extract the group
      matcher.find();
      System.out.println(matcher.group());
  }
}


public String group(int group) 
这个方法提供了强大而方便的工具,可以得到指定的group所匹配的输入字符串 
因为这两个方法经常使用,同样我们看一个小例子: 

import java.util.regex.*;
public class MatcherGroupParamExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("B(ond)");
     //create a Matcher and use the Matcher.group(int) method
     String candidateString = "My name is Bond. James Bond.";
     //create a helpful index for the sake of output
     Matcher matcher = p.matcher(candidateString);
     //Find group number 0 of the first find
      matcher.find();
      String group_0 = matcher.group(0);
      String group_1 = matcher.group(1);
      System.out.println("Group 0 " + group_0);
      System.out.println("Group 1 " + group_1);
      System.out.println(candidateString);
     //Find group number 1 of the second find
      matcher.find();
      group_0 = matcher.group(0);
      group_1 = matcher.group(1);
      System.out.println("Group 0 " + group_0);
      System.out.println("Group 1 " + group_1);
      System.out.println(candidateString);
  }
}

public int groupCount() 

这个方法返回了,正则表达式的匹配的组数。 


public boolean matches() 

尝试将整个区域与模式匹配。这个要求整个输入字符串都要和正则表达式匹配。 

和find不同, find是会在整个输入字符串查找匹配的子字符串。 

public boolean find() 

find会在整个输入中寻找是否有匹配的子字符串,一般我们使用find的流程: 
 while(matcher.find()){
    //在匹配的区域,使用group,replace等进行查看和替换操作
 }

public boolean find(int start) 
从输入字符串指定的start位置开始查找。 

public boolean lookingAt() 
基本上是matches更松约束的一个方法,尝试将从区域开头开始的输入序列与该模式匹配 

public Matcher appendReplacement (StringBuffer sb, String replacement) 
你想把My name is Bond. James Bond. I would like a martini中的Bond换成Smith 

StringBuffer sb = new StringBuffer();
String replacement = "Smith";
Pattern pattern = Pattern.compile("Bond");
Matcher matcher =pattern.matcher("My name is Bond. James Bond. I would like a martini.");
while(matcher.find()){
  matcher.appendReplacement(sb,replacement);//结果是My name is Smith. James Smith
}

Matcher对象会维护追加的位置,所以我们才能不断地使用appendReplacement来替换所有的匹配。 

public StringBuffer appendTail(StringBuffer sb) 
这个方法简单的把为匹配的结尾追加到StringBuffer中。在上一个例子的最后再加上一句: 
matcher.appendTail(sb); 
结果就会成为My name is Smith. James Smith. I would like a martini. 

public String replaceAll(String replacement) 
这个是一个更方便的方法,如果我们想替换所有的匹配的话,我们可以简单的使用replaceAll就ok了。 
是: 

while(matcher.find()){
  matcher.appendReplacement(sb,replacement);//结果是My name is Smith. James Smith
}
matcher.appendTail(sb);

的更便捷的方法。 

public String replaceFirst(String replacement)

这个与replaceAll想对应很容易理解,就是只替换第一个匹配的。

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值