正则表达式

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/fengkaungdewoniu/article/details/52728953

参考文章:神秘的正则表达式

参考书籍:java编程思想(第四版)

正则表达式的定义:一般来说,正则表达式就是以某种方式来描述字符串。因此你可以说,“一个字符串如果有这些东西,他就是我要找的东西。”

正则表达式是一种强大而灵活的文本处理工具,使用正则表达式我们可以以编程的方式,构造复杂的文本模式,并且对输入的字符串进行搜索。一旦找到了匹配这些模式的部分,你就可以随心所欲的对他们进行处理。(摘自 java编程思想)

表一:

代码解释
\b匹配单词的开始或结束
.匹配除换行符以外的任意字符
\d匹配数字
\w匹配字母或数字或下划线或汉字
^匹配字符串的开始
$匹配字符串的结束
\s匹配任意的空白符
*重复零次或更多次
+重复一次或更多次
?重复零次或一次
{n}重复n次
{n,m}重复n次到m次
\W匹配任意不是字母,数字,下划线,汉字的字符
\S匹配任意不是空白符的字符
\D匹配任意非数字的字符
\B匹配不是单词开头或结束的位置
[^A]匹配除了A以外的任意字符
[^ABCDE]匹配除了ABCDE这几个字母以外的任意字符
*?重复任意次,但尽可能少重复
+?重复1次或更多次,但尽可能少重复
??重复0次或1次,但尽可能少重复
{n,m}?重复n到m次,但尽可能少重复
{n,}?重复n次以上,但尽可能少重复

表二

B指定字符B
\xhh十六进制值为0xhh的字符
\xhhhh十六进制值为0xhhhh的字符
\t制表符 Tab
]n换行符
\r回车
\f换页
\e转义(Escape)
[abc]包含a、b、c中的任何字符(与a | b | c意义相同)
[^abc]包含a、b、c 以外的任意字符(否定)
[a-zA-Z]a到z,A-Z之间的任一字符
[abc[hij]]包含a、b、c、h、i、j中的任意字符
[a-z&&[hij]]h、i、j中的任意一个(取交)

下面举三个小例子,引入正则表达式的用法:

1.整数匹配:

public class IntegerMatch {

	public static void main(String[] args) {
		/*
		 * ?:代表重复零次或一次,所以 -?表示可能以 “-”或者没有符号开头
		 * +:代表重复一次或多次
		 * \d:代表匹配数字
		 * "-?\\d+" :表示一个一位或多位的整数
		 * (-|\\+)?:以负号或者正号,或没有符号开头
		 * 在正则表达式中括号有着将表达式分组的作用,而“|”则表示或操作,因为"+"在正则表达式中有特殊的意义,所以这里采用
		 * “\\”进行转义
		 */
		System.out.println("1809".matches("-?\\d+")); //true
		System.out.println("+1809".matches("-?\\d+"));//false
		System.out.println("+1809".matches("(-|\\+)?\\d+"));//true
		System.out.println("-1809".matches("(-|\\+)?\\d+"));//true
		System.out.println("1809".matches("(-|\\+)?\\d+"));//true
		System.out.println("+1809".matches("\\+?\\d+"));//true--以正号或没有符号开头
	}

}
2.字符串匹配

public class MatchesString {
/*
 * String类自带了一个非常有用的正则表达式的工具 "split()"方法,其功能是将字符串从正则表达式匹配的地方切开
 */
	public static String str = "If you wan,,,,,, the best the world has to offer ,"
			+ "offer the world your best , "
			+ "you should try your best to catch what you wan,,,,";
	public static void split(String regex){
		System.out.println(Arrays.toString(str.split(regex)));
	}
	/**
	 * "\W"  匹配任意不是字母,数字,下划线,汉字的字符,"\w":表示一个单词字符
	 * 与正则表达式匹配的部分都不会继续存在
	 * @param args
	 */
	public static void main(String[] args){
		split(" ");//按空格或分字符串
		split("\\W+");//所有的非单词字符都匹配,去除了标点符号
		split("n\\W+");//n紧邻的非单词字符都去除了n及后面的非单词字符
	}
}

代码输出结果如下:

[If, you, wan,,,,,,, the, best, the, world, has, to, offer, ,offer, the, world, your, best, ,, you, should, try, your, best, to, catch, what, you, wan,,,,]
[If, you, wan, the, best, the, world, has, to, offer, offer, the, world, your, best, you, should, try, your, best, to, catch, what, you, wan]
[If you wa, the best the world has to offer ,offer the world your best , you should try your best to catch what you wa]

3.字符串替换:

public class RegexReplace {
	static String s = "try your best offter the wordfhh ofof fo";
	/*
	 * String类自带的一个工具“replace()”方法:可以只替换正则表达式第一个匹配的子串或者所有匹配的子串
	 */
	public static void main(String[] args) {
		//替换满足f后面若干个字符的字符串为 replacement
		System.out .println(s.replaceFirst("f\\w+", "replacement"));//只替换第一个
		System.out .println(s.replaceAll("f\\w+", "replacement"));//替换全部
	}

}

执行结果如下:

try your best oreplacement the wordfhh ofof fo
try your best oreplacement the wordreplacement oreplacement replacement

正则表达式的创建

参考上述两个表,根据要求写出最简单以及最必要的正则表达式。


  • 正则表达式的应用

Java正则表达式通过java.util.regex包下的Pattern类与Matcher类实现.
  Pattern类用于创建一个正则表达式,也可以说创建一个匹配模式,它的构造方法是私有的,不可以直接创建,但可以通过Pattern.complie(String regex)简单工厂方法创建一个正则表达式, Matcher对象是对输入字符串进行解释和匹配操作的引擎。与Pattern类一样,Matcher也没有公共构造方法。你需要调用Pattern对象的matcher方法来获得一个Matcher对象。compile参数是正则表达式,matcher参数是要匹配的字符串。(摘自参考博客)

public class PatternMatch {
	public static void main(String[] args) {
		/*
		 * Compiles the given regular expression into a pattern. </p>
		 * 
		 * @param regex:The expression to be compiled
		 * 
		 * @throws PatternSyntaxException:If the expression's syntax is invalid
		 */
		Pattern pat = compile("");// 字符串匹配
		// Pattern.compile("", Pattern.CASE_INSENSITIVE);可以忽略大小写匹配
		/*
		 * Creates a matcher that will match the given input against this
		 * pattern.
		 * 
		 * @param input:The character sequence to be matched
		 * 
		 * @return A new matcher for this pattern
		 */
		Matcher matcher = pat.matcher("");// 输入的字符串
		/*<strong>全部匹配</strong>
		 * Attempts to match the entire region(整个区域) against the pattern. If the
		 * match succeeds then more information can be obtained via the start,
		 * end, and group methods.
		 * 
		 * @return true if, and only if, the entire region sequence matches this
		 * matcher's pattern 当且仅当全部匹配区域匹配匹配模式时返回true
		 */
		boolean b = matcher.matches();// 检验是否匹配
		/*
		 * <strong>部分查找:</strong> Attempts to find the next subsequence of the input sequence
		 * that matches the pattern:尝试去寻找满足模式的输入序列的下一个子序列 This method starts at
		 * the beginning of this matcher's region, or, if a previous invocation
		 * of the method was successful and the matcher has not since been
		 * reset, at the first character not matched by the previous match。If
		 * the match succeeds then more information can be obtained via the
		 * ,start, end, and group methods:这个方法从匹配区域的开始,如果前一方法调用成功且从那时起匹配器没有被重置,
		 * 则从以前匹配操作没有匹配的第一个字符开始。 如果匹配成功,那么通过start、end、group等方法可以获取更多的 信息
		 * 
		 * @return true if, and only if, a subsequence of the input sequence
		 * matches this matcher's pattern:当且仅当输入序列的一个子串满足匹配模式时返回true。
		 */
		matcher.find();
		System.out.println(b);// sysout-- alt+/
	}
}

上面的例子也说明了macther,find()和matcher.matches()两个方法的区别。由源码可知,如果用同一个matcher对象,find()和matches()调用先后顺序不同,返回值不同

 /**
     * The storage used by groups. They may contain invalid values if a group was skipped during the matching.
     */
    int[] groups;

    /**
     * The range within the sequence that is to be matched. Anchorswill match at these "hard" boundaries. Changing the regionchanges these values.
     */
    int from, to;

    /**
     * Lookbehind uses this value to ensure that the subexpressionmatch ends at the point where the lookbehind was encountered.
     */
    int lookbehindTo;

    /**
     * The original string being matched.
     */
    CharSequence text;

    /**
     * Matcher state used by the last node. NOANCHOR is used when amatch does not have to consume all of the input. ENDANCHOR isthe mode used for matching all the input.
     */
    static final int ENDANCHOR = 1;
    static final int NOANCHOR = 0;
    int acceptMode = NOANCHOR;

    /**
     * The range of string that last matched the pattern. If the lastmatch failed then first is -1; last initially holds 0 then it
     * holds the index of the end of the last match (which is where the next search starts).
     */
    int first = -1, last = 0;

    /**
     * The end index of what matched in the last match operation.
	   上次匹配操作的匹配序号
     */
    int oldLast = -1;
    /**
     *标记更多的输入是否能改变前一次匹配的结果,如果为true,如果有匹配,那么更多的输入将会改变之前的 匹配。如果为true,但是之前没有匹配,
     *那么更多的输入将不会改变匹配。如果为false,有匹配不会影响匹配,没有匹配,更多的输入不会造成匹配
     */
    boolean hitEnd;

    /**
     * Boolean indicating whether or not more input could changea positive match into a negative one.
     *
     * If requireEnd is true, and a match was found, then moreinput could cause the match to be lost.
     * If requireEnd is false and a match was found, then more input might change the match but the match won't be lost.
     * If a match was not found, then requireEnd has no meaning.
     */
    boolean requireEnd;

	 public boolean matches() {
	        return match(from, ENDANCHOR);
	    }
	 /**
	     * Initiates a search for an anchored match to a Pattern within the given
	     * bounds. The groups are filled with default values and the match of the
	     * root of the state machine is called. The state machine will hold the
	     * state of the match as it proceeds in this matcher.
	     */
	    boolean match(int from, int anchor) {
	        this.hitEnd = false;
	        this.requireEnd = false;
	        from        = from < 0 ? 0 : from;
	        this.first  = from;
	        this.oldLast = oldLast < 0 ? from : oldLast;
	        for (int i = 0; i < groups.length; i++)
	            groups[i] = -1;
	        acceptMode = anchor;
	        boolean result = parentPattern.matchRoot.match(this, from, text);
	        if (!result)
	            this.first = -1;//没匹配,第一个匹配索引就是-1

	        <span style="color:#CC0000;">this.oldLast = this.last; //这次的,已经相当于下一次的上一次</span>

	        return result;
	    }


    public boolean find() {
        <span style="color:#CC0000;">int nextSearchIndex = last;</span>
        if (nextSearchIndex == first)
            nextSearchIndex++;

        // If next search starts before region, start it at region
        if (nextSearchIndex < from)
            nextSearchIndex = from;

        // If next search starts beyond region then it fails
        if (nextSearchIndex > to) {
            for (int i = 0; i < groups.length; i++)
                groups[i] = -1;
            return false;
        }
        return search(nextSearchIndex);
    }

消除影响的方式:在两者之间调用matcher.reset();

group.start.end用法解析

public class RegexFunc {

	public static void main(String[] args) {
		String str = "my name is example xiamd. xian.";
	    Pattern pattern = Pattern.compile("x(\\w+)(\\.)");
	    Matcher matcher = pattern.matcher(str);
	    int lon = matcher.groupCount();// 此处值返回的是2,他不是最终匹配的结果数,他只是子模式匹配的结果数,一般情况下子模式有几个,他就是几
	    while (matcher.find()) {
	        System.out.println("Group 0:" + matcher.group(0));// 得到第0组——整个匹配
	        System.out.println("Group 1:" + matcher.group(1));// 得到第一组匹配——与(\\w+)匹配的
	        System.out.println("Group 2:" + matcher.group(2));// 得到第二组匹配——与(\\.)匹配的,组也就是子表达式
	        
	        System.out.println("Start 0:" + matcher.start(0) + " End 0:"
	                + matcher.end(0));// 总匹配的索引
	        System.out.println("Start 1:" + matcher.start(1) + " End 1:"
	                + matcher.end(1));// 第一组匹配的索引
	        System.out.println("Start 2:" + matcher.start(2) + " End 2:"
	                + matcher.end(2));// 第二组匹配的索引
	        System.out.println(str.substring(matcher.start(0), matcher.end(1)));// 从总匹配开始索引到第1组匹配的结束索引之间子串——xiehui.
	        System.err.println("replaceAll : "+matcher.replaceAll("WHYNOT"));//不改变原始字符串str,返回值是替换后的新字符串
	    }
	}

}

运行结果如下

Group 0:xiamd.
Group 1:iamd
Group 2:.
Start 0:19 End 0:25
Start 1:20 End 1:24
Start 2:24 End 2:25
xiamd
replaceAll : my name is example WHYNOT WHYNOT

常用的正则表达式:


  • 验证是否是ipv4 :

((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)

  • 校验密码 :密码长度不低于6位,不能为特殊字符只能为大小写字母或者数字组合

^[a-z0-9A-Z]{6,}$

  • 验证身份证
(^[1-9]\\d{14}$)|(^[1-9]\\d{16}([0-9]|X)$)
  • 验证邮箱

>^([a-z0-9A-Z]+[_|-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$

  • 校验手机号:匹配13*,145,147,15*,176,177,18*号段手机号,若有当前运营商号段没有考虑到,自己添加

 ^(13[0-9]|14[5|7]|15\\d|17[6|7]|18[\d])\\d{8}$


展开阅读全文

没有更多推荐了,返回首页