预测师源码学习系列二:Pattern类和Matcher类

正则表达式

最近在看Java正则,java.util.regex包里最主要的是Pattern类和Matcher类。

Pattern类的主要作用

pattern类的作用可以理解为将我们写的正则表达式字符串变为Java里的pattern类。

Pattern类重要方法

/**
 * Compiles the given regular expression into a pattern with the given
 * flags.
 *
 * @param  regex
 *         The expression to be compiled
 *
 * @param  flags
 *         Match flags, a bit mask that may include
 *         {@link #CASE_INSENSITIVE}, {@link #MULTILINE}, {@link #DOTALL},
 *         {@link #UNICODE_CASE}, {@link #CANON_EQ}, {@link #UNIX_LINES},
 *         {@link #LITERAL}, {@link #UNICODE_CHARACTER_CLASS}
 *         and {@link #COMMENTS}
 *        
 * @return the given regular expression compiled into a pattern with the given flags
 * @throws  IllegalArgumentException
 *          If bit values other than those corresponding to the defined
 *          match flags are set in <tt>flags</tt>
 *
 * @throws  PatternSyntaxException
 *          If the expression's syntax is invalid
 */
 public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}

/**
 * Compiles the given regular expression into a pattern with the given
 * flags.
 *
 * @param  regex
 *         The expression to be compiled
 *
 * @param  flags
 *         Match flags, a bit mask that may include
 *         {@link #CASE_INSENSITIVE}, {@link #MULTILINE}, {@link #DOTALL},
 *         {@link #UNICODE_CASE}, {@link #CANON_EQ}, {@link #UNIX_LINES},
 *         {@link #LITERAL}, {@link #UNICODE_CHARACTER_CLASS}
 *         and {@link #COMMENTS}
 *
 * @return the given regular expression compiled into a pattern with the given flags
 * @throws  IllegalArgumentException
 *          If bit values other than those corresponding to the defined
 *          match flags are set in <tt>flags</tt>
 *
 * @throws  PatternSyntaxException
 *          If the expression's syntax is invalid
 */
public static Pattern compile(String regex, int flags) {
    return new Pattern(regex, flags);
}

Pattern pattern = compile(String regex, int flag)
传入不同的flag可以控制不同的匹配行为。不同的flag对应的参数见 详情解析

/**
 * Splits the given input sequence around matches of this pattern.
 *
 * <p> The array returned by this method contains each substring of the
 * input sequence that is terminated by another subsequence that matches
 * this pattern or is terminated by the end of the input sequence.  The
 * substrings in the array are in the order in which they occur in the
 * input. If this pattern does not match any subsequence of the input then
 * the resulting array has just one element, namely the input sequence in
 * string form.
 *
 * <p> When there is a positive-width match at the beginning of the input
 * sequence then an empty leading substring is included at the beginning
 * of the resulting array. A zero-width match at the beginning however
 * never produces such empty leading substring.
 *
 * <p> The <tt>limit</tt> parameter controls the number of times the
 * pattern is applied and therefore affects the length of the resulting
 * array.  If the limit <i>n</i> is greater than zero then the pattern
 * will be applied at most <i>n</i>&nbsp;-&nbsp;1 times, the array's
 * length will be no greater than <i>n</i>, and the array's last entry
 * will contain all input beyond the last matched delimiter.  If <i>n</i>
 * is non-positive then the pattern will be applied as many times as
 * possible and the array can have any length.  If <i>n</i> is zero then
 * the pattern will be applied as many times as possible, the array can
 * have any length, and trailing empty strings will be discarded.
 *
 * <p> The input <tt>"boo:and:foo"</tt>, for example, yields the following
 * results with these parameters:
 *
 * <blockquote><table cellpadding=1 cellspacing=0
 *              summary="Split examples showing regex, limit, and result">
 * <tr><th align="left"><i>Regex&nbsp;&nbsp;&nbsp;&nbsp;</i></th>
 *     <th align="left"><i>Limit&nbsp;&nbsp;&nbsp;&nbsp;</i></th>
 *     <th align="left"><i>Result&nbsp;&nbsp;&nbsp;&nbsp;</i></th></tr>
 * <tr><td align=center>:</td>
 *     <td align=center>2</td>
 *     <td><tt>{ "boo", "and:foo" }</tt></td></tr>
 * <tr><td align=center>:</td>
 *     <td align=center>5</td>
 *     <td><tt>{ "boo", "and", "foo" }</tt></td></tr>
 * <tr><td align=center>:</td>
 *     <td align=center>-2</td>
 *     <td><tt>{ "boo", "and", "foo" }</tt></td></tr>
 * <tr><td align=center>o</td>
 *     <td align=center>5</td>
 *     <td><tt>{ "b", "", ":and:f", "", "" }</tt></td></tr>
 * <tr><td align=center>o</td>
 *     <td align=center>-2</td>
 *     <td><tt>{ "b", "", ":and:f", "", "" }</tt></td></tr>
 * <tr><td align=center>o</td>
 *     <td align=center>0</td>
 *     <td><tt>{ "b", "", ":and:f" }</tt></td></tr>
 * </table></blockquote>
 *
 * @param  input
 *         The character sequence to be split
 *
 * @param  limit
 *         The result threshold, as described above
 *
 * @return  The array of strings computed by splitting the input
 *          around matches of this pattern
 */
 public String[] split(CharSequence input, int limit) {
    int index = 0;
    boolean matchLimited = limit > 0;
    ArrayList<String> matchList = new ArrayList<>();
    Matcher m = matcher(input);

    // Add segments before each match found
    while(m.find()) {
        if (!matchLimited || matchList.size() < limit - 1) {
            if (index == 0 && index == m.start() && m.start() == m.end()) {
                // no empty leading substring included for zero-width match
                // at the beginning of the input char sequence.
                continue;
            }
            String match = input.subSequence(index, m.start()).toString();
            matchList.add(match);
            index = m.end();
        } else if (matchList.size() == limit - 1) { // last one
            String match = input.subSequence(index,
                                             input.length()).toString();
            matchList.add(match);
            index = m.end();
        }
    }

    // If no match was found, return this
    if (index == 0)
        return new String[] {input.toString()};

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);
}

/**
 * Splits the given input sequence around matches of this pattern.
 *
 * <p> This method works as if by invoking the two-argument {@link
 * #split(java.lang.CharSequence, int) split} method with the given input
 * sequence and a limit argument of zero.  Trailing empty strings are
 * therefore not included in the resulting array. </p>
 *
 * <p> The input <tt>"boo:and:foo"</tt>, for example, yields the following
 * results with these expressions:
 *
 * <blockquote><table cellpadding=1 cellspacing=0
 *              summary="Split examples showing regex and result">
 * <tr><th align="left"><i>Regex&nbsp;&nbsp;&nbsp;&nbsp;</i></th>
 *     <th align="left"><i>Result</i></th></tr>
 * <tr><td align=center>:</td>
 *     <td><tt>{ "boo", "and", "foo" }</tt></td></tr>
 * <tr><td align=center>o</td>
 *     <td><tt>{ "b", "", ":and:f" }</tt></td></tr>
 * </table></blockquote>
 *
 *
 * @param  input
 *         The character sequence to be split
 *
 * @return  The array of strings computed by splitting the input
 *          around matches of this pattern
 */
public String[] split(CharSequence input) {
    return split(input, 0);
}

String[] strArr = split(CharSequence string, int limit);
Pattern类中的split()方法将字符串按照正则表达式的规则进行拆分组装并生成为字符数组。
有趣的是,String类中也存在split方法,并且它会调用Pattern类中的的split方法,这说明了,String类也使用并且拓展了Pattern类的某些功能。

Matcher类的主要作用

matcher类保留匹配的结果及状态。先pattern字符串,再matcher这个pattern,最后自己想要什么就根据matcher取什么。

Matcher类重要方法

 /**
 * Attempts to match the entire region against the pattern.
 *
 * <p> If the match succeeds then more information can be obtained via the
 * <tt>start</tt>, <tt>end</tt>, and <tt>group</tt> methods.  </p>
 *
 * @return  <tt>true</tt> if, and only if, the entire region sequence
 *          matches this matcher's pattern
  
 public boolean matches() {
    return match(from, ENDANCHOR);
 }
 */

boolean result = matches() {}; // 验证字符串

Java正则表达式使用方法举例

基本用法:

System.out.println(Pattern.compile("规则").matcher("要匹配的对象").matches());

工作中用到的举例:

public static boolean checkEmail(String email) {
    String regex = "\\w+@\\w+\\.[a-z]+(\\.[a-z]+)?";
	return Pattern.matches(regex, email);
 }
  • 8
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值