Java Regex Tutorial

Java and Regular Expressions

This tutorial introduces the usage of regular expressions and describers their with Java. It also provides several Java regular expression examples.


1. Regular Expressions

1.1. Overview

regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined by the regular expression may match one or several times or not at all for a given string.

The abbreviation for regular expression is regex.

The process of analyzing or modifying a text with a regex is called: The regular expression is applied to the text (string) .

The pattern defined by the regex is applied on the text from left to right. Once a source character has been used in a match, it cannot be reused. For example the regex aba will match ababababa only two times (aba_aba__).

A simple example for a regular expression is a (literal) string. For example the Hello World regex will match the "Hello World" string.

. (dot) is another example for an regular expression. A dot matches any single character; it would match for example "a" or "z" or "1".

1.2. Support for regular expressions in programming languages

Regular expressions are supported by most programming languages, e.g. Java, Perl, Groovy, etc. Unfortunately each language supports regular expressions slightly different.

This tutorial describes the usage of regular expression within the Java programming language and within the Eclipse IDE.

2. Prerequisites

The following tutorial assumes that you have basic knowledge of the Java programming language.

Some of the following examples use JUnit to validate the result. You should be able to adjust them in case if you do not want to use JUnit. To learn about JUnit please see JUnit Tutorial .

3. Rules of writing regular expressions

The following description is an overview of available meta characters which can be used in regular expressions. This chapter is supposed to be a references for the different regex elements.

3.1. Common matching symbols

Table 1. 

Regular Expression Description
. Matches any character
^regex regex must match at the beginning of the line
regex$ Finds regex must match at the end of the line
[abc] Set definition, can match the letter a or b or c
[abc][vz] Set definition, can match a or b or c followed by either v or z
[^abc] When a "^" appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c
[a-d1-7] Ranges, letter between a and d and figures from 1 to 7, will not match d1
X|Z Finds X or Z
XZ Finds X directly followed by Z
$ Checks if a line end follows


3.2. Metacharacters

The following metacharacters have a pre-defined meaning and make certain common pattern easier to use, e.g. \d instead of [0..9].

Table 2. 

Regular Expression Description
\d Any digit, short for [0-9]
\D A non-digit, short for [^0-9]
\s A whitespace character, short for [ \t\n\x0b\r\f]
\S A non-whitespace character, for short for [^\s]
\w A word character, short for [a-zA-Z_0-9]
\W A non-word character [^\w]
\S+ Several non-whitespace characters
\b Matches a word boundary. A word character is [a-zA-Z0-9_] and \b matches its bounderies.


3.3. Quantifier

A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions

Table 3. 

Regular Expression Description Examples
* Occurs zero or more times, is short for {0,} X* - Finds no or several letter X, .* - any character sequence
+ Occurs one or more times, is short for {1,} X+ - Finds one or several letter X
? Occurs no or one times, ? is short for {0,1} X? -Finds no or exactly one letter X
{X} Occurs X number of times, {} describes the order of the preceding liberal \d{3} - Three digits, .{10} - any character sequence of length 10
{X,Y} Occurs between X and Y times, \d{1,4}- \d must occur at least once and at a maximum of four
*? ? after a quantifier makes it a reluctant quantifier, it tries to find the smallest match.  


3.4. Grouping and Backreference

You can group parts of your regular expression. In your pattern you group elements via round brackets, e.g. "()". This allows you to assign a repetition operator the a complete group.

In addition these groups also create a backreference to the part of the regular expression. This captures the group. A backreference stores the part of the String which matched the group. This allows you to use this part in the replacement.

Via the $ you can refer to a group. $1 is the first group, $2 the second, etc.

Lets for example assume you want to replace all whitespace between a letter followed by a point or a comma. This would involve that the point or the comma is part of the pattern. Still it should be included in the result

// Removes whitespace between a word character and . or ,
String pattern = "(\\w)(\\s+)([\\.,])";
System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$1$3")); 

This example extracts the text between a title tag.

// Extract the text between the two title elements
pattern = "(?i)(<title.*?>)(.+?)(</title>)";
String updated = EXAMPLE_TEST.replaceAll(pattern, "$2"); 

3.5. Negative Lookahead

Negative Lookahead provide the possibility to exclude a pattern. With this you can say that a string should not be followed by another string.

Negative Lookaheads are defined via (?!pattern). For example the following will match a if a is not followed by b.

a(?!b) 

3.6. Backslashes in Java

The backslash is an escape character in Java Strings. e.g. backslash has a predefined meaning in Java. You have to use "\\" to define a single backslash. If you want to define "\w" then you must be using "\\w" in your regex. If you want to use backslash you as a literal you have to type \\\\ as \ is also a escape character in regular expressions.

4. Using Regular Expressions with String.matches()

4.1. Overview

Strings in Java have build in support for regular expressions. Strings have three build in methods for regular expressions, e.g. matches()split())replace(). .

These methods are not optimized for performance. We will later use classes which are optimized for performance.

Table 4. 

Method Description
s.matches("regex") Evaluates if "regex" matches s. Returns only true if the WHOLE string can be matched
s.split("regex") Creates array with substrings of s divided at occurrence of "regex". "regex" is not included in the result.
s.replace("regex"), "replacement" Replaces "regex" with "replacement


Create for the following example the Java project de.vogella.regex.test.

package de.vogella.regex.test;

public class RegexTestStrings {
  public static final String EXAMPLE_TEST = "This is my small example "
      + "string which I'm going to " + "use for pattern matching.";

  public static void main(String[] args) {
    System.out.println(EXAMPLE_TEST.matches("\\w.*"));
    String[] splitString = (EXAMPLE_TEST.split("\\s+"));
    System.out.println(splitString.length);// Should be 14
    for (String string : splitString) {
      System.out.println(string);
    }
    // Replace all whitespace with tabs
    System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t"));
  }
} 

4.2. Examples

The following class gives several examples for the usage of regular expressions with strings. See the comment for the purpose.

If you want to test these examples, create for the Java project de.vogella.regex.string.

package de.vogella.regex.string;

public class StringMatcher {
  // Returns true if the string matches exactly "true"
  public boolean isTrue(String s){
    return s.matches("true");
  }
  // Returns true if the string matches exactly "true" or "True"
  public boolean isTrueVersion2(String s){
    return s.matches("[tT]rue");
  }
  
  // Returns true if the string matches exactly "true" or "True"
  // or "yes" or "Yes"
  public boolean isTrueOrYes(String s){
    return s.matches("[tT]rue|[yY]es");
  }
  
  // Returns true if the string contains exactly "true"
  public boolean containsTrue(String s){
    return s.matches(".*true.*");
  }
  

  // Returns true if the string contains of three letters
  public boolean isThreeLetters(String s){
    return s.matches("[a-zA-Z]{3}");
    // Simpler from for
//    return s.matches("[a-Z][a-Z][a-Z]");
  }
  


  // Returns true if the string does not have a number at the beginning
  public boolean isNoNumberAtBeginning(String s){
    return s.matches("^[^\\d].*");
  }
  // Returns true if the string contains a arbitrary number of characters except b
  public boolean isIntersection(String s){
    return s.matches("([\\w&&[^b]])*");
  }
  // Returns true if the string contains a number less then 300
  public boolean isLessThenThreeHundret(String s){
    return s.matches("[^0-9]*[12]?[0-9]{1,2}[^0-9]*");
  }
  
} 

And a small JUnit Test to validates the examples.

package de.vogella.regex.string;

import org.junit.Before;
import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class StringMatcherTest {
  private StringMatcher m;

  @Before
  public void setup(){
    m = new StringMatcher();
  }

  @Test
  public void testIsTrue() {
    assertTrue(m.isTrue("true"));
    assertFalse(m.isTrue("true2"));
    assertFalse(m.isTrue("True"));
  }

  @Test
  public void testIsTrueVersion2() {
    assertTrue(m.isTrueVersion2("true"));
    assertFalse(m.isTrueVersion2("true2"));
    assertTrue(m.isTrueVersion2("True"));;
  }

  @Test
  public void testIsTrueOrYes() {
    assertTrue(m.isTrueOrYes("true"));
    assertTrue(m.isTrueOrYes("yes"));
    assertTrue(m.isTrueOrYes("Yes"));
    assertFalse(m.isTrueOrYes("no"));
  }

  @Test
  public void testContainsTrue() {
    assertTrue(m.containsTrue("thetruewithin"));
  }

  @Test
  public void testIsThreeLetters() {
    assertTrue(m.isThreeLetters("abc"));
    assertFalse(m.isThreeLetters("abcd"));
  }
  
  @Test
  public void testisNoNumberAtBeginning() {
    assertTrue(m.isNoNumberAtBeginning("abc"));
    assertFalse(m.isNoNumberAtBeginning("1abcd"));
    assertTrue(m.isNoNumberAtBeginning("a1bcd"));
    assertTrue(m.isNoNumberAtBeginning("asdfdsf"));
  }
  
  @Test
  public void testisIntersection() {
    assertTrue(m.isIntersection("1"));
    assertFalse(m.isIntersection("abcksdfkdskfsdfdsf"));
    assertTrue(m.isIntersection("skdskfjsmcnxmvjwque484242"));
  }
  

  @Test
  public void testLessThenThreeHundret() {
    assertTrue(m.isLessThenThreeHundret("288"));
    assertFalse(m.isLessThenThreeHundret("3288"));
    assertFalse(m.isLessThenThreeHundret("328 8"));
    assertTrue(m.isLessThenThreeHundret("1"));
    assertTrue(m.isLessThenThreeHundret("99"));
    assertFalse(m.isLessThenThreeHundret("300"));
  }

} 

5. Pattern and Matcher

For advanced regular expressions the java.util.regex.Pattern and java.util.regex.Matcher classes are used.

You first create a Pattern object which defines the regular expression. This Pattern object allows you to create aMatcher object for a given string. This Matcher object then allows you to do regex operations on a String.

package de.vogella.regex.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTestPatternMatcher {
  public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";

  public static void main(String[] args) {
    Pattern pattern = Pattern.compile("\\w+");
    // In case you would like to ignore case sensitivity you could use this
    // statement
    // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(EXAMPLE_TEST);
    // Check all occurance
    while (matcher.find()) {
      System.out.print("Start index: " + matcher.start());
      System.out.print(" End index: " + matcher.end() + " ");
      System.out.println(matcher.group());
    }
    // Now create a new pattern and matcher to replace whitespace with tabs
    Pattern replace = Pattern.compile("\\s+");
    Matcher matcher2 = replace.matcher(EXAMPLE_TEST);
    System.out.println(matcher2.replaceAll("\t"));
  }
} 

6. Java Regex Examples

The following lists typical examples for the usage of regular expressions. I hope you find similarities to your examples.

6.1. Or

Task: Write a regular expression which matches a text line if this text line contains either the word "Joe" or the word "Jim" or both.

Create a project de.vogella.regex.eitheror and the following class.

package de.vogella.regex.eitheror;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class EitherOrCheck {
  @Test
  public void testSimpleTrue() {
    String s = "humbapumpa jim";
    assertTrue(s.matches(".*(jim|joe).*"));
    s = "humbapumpa jom";
    assertFalse(s.matches(".*(jim|joe).*"));
    s = "humbaPumpa joe";
    assertTrue(s.matches(".*(jim|joe).*"));
    s = "humbapumpa joe jim";
    assertTrue(s.matches(".*(jim|joe).*"));
  }
} 

6.2. Phone number

Task: Write a regular expression which matches any phone number.

A phone number in this example consists either out of 7 numbers in a row or out of 3 number a (white)space or a dash and then 4 numbers.

package de.vogella.regex.phonenumber;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;


public class CheckPhone {
  
  @Test
  public void testSimpleTrue() {
    String pattern = "\\d\\d\\d([,\\s])?\\d\\d\\d\\d";
    String s= "1233323322";
    assertFalse(s.matches(pattern));
    s = "1233323";
    assertTrue(s.matches(pattern));
    s = "123 3323";
    assertTrue(s.matches(pattern));
  }
} 

6.3. Check for a certain number range

The following example will check if a text contains a number with 3 digits.

Create the Java project "de.vogella.regex.numbermatch" and the following class.

package de.vogella.regex.numbermatch;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class CheckNumber {

  
  @Test
  public void testSimpleTrue() {
    String s= "1233";
    assertTrue(test(s));
    s= "0";
    assertFalse(test(s));
    s = "29 Kasdkf 2300 Kdsdf";
    assertTrue(test(s));
    s = "99900234";
    assertTrue(test(s));
  }
  

  
  
  public static boolean test (String s){
    Pattern pattern = Pattern.compile("\\d{3}");
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()){
      return true; 
    } 
    return false; 
  }

} 

6.4. Building a link checker

The following example allows you to extract all valid links from a webpage. It does not consider links with start with "javascript:" or "mailto:".

Create a Java project called de.vogella.regex.weblinks and the following class:

package de.vogella.regex.weblinks;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LinkGetter {
  private Pattern htmltag;
  private Pattern link;

  public LinkGetter() {
    htmltag = Pattern.compile("<a\\b[^>]*href=\"[^>]*>(.*?)</a>");
    link = Pattern.compile("href=\"[^>]*\">");
  }

  public List<String> getLinks(String url) {
    List<String> links = new ArrayList<String>();
    try {
      BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
      String s;
      StringBuilder builder = new StringBuilder();
      while ((s = bufferedReader.readLine()) != null) {
        builder.append(s);
      }

      Matcher tagmatch = htmltag.matcher(builder.toString());
      while (tagmatch.find()) {
        Matcher matcher = link.matcher(tagmatch.group());
        matcher.find();
        String link = matcher.group().replaceFirst("href=\"", "")
            .replaceFirst("\">", "")
            .replaceFirst("\"[\\s]?target=\"[a-zA-Z_0-9]*", "");
        if (valid(link)) {
          links.add(makeAbsolute(url, link));
        }
      }
    } catch (MalformedURLException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
    return links;
  }

  private boolean valid(String s) {
    if (s.matches("javascript:.*|mailto:.*")) {
      return false;
    }
    return true;
  }

  private String makeAbsolute(String url, String link) {
    if (link.matches("http://.*")) {
      return link;
    }
    if (link.matches("/.*") && url.matches(".*$[^/]")) {
      return url + "/" + link;
    }
    if (link.matches("[^/].*") && url.matches(".*[^/]")) {
      return url + "/" + link;
    }
    if (link.matches("/.*") && url.matches(".*[/]")) {
      return url + link;
    }
    if (link.matches("/.*") && url.matches(".*[^/]")) {
      return url + link;
    }
    throw new RuntimeException("Cannot make the link absolute. Url: " + url
        + " Link " + link);
  }
} 

6.5. Finding duplicated words

The following regular expression matches duplicated words.

\b(\w+)\s+\1\b 

\b is a word boundary and \1 references to the captured match of the first group, i.e. the first word.

The (?!-in)\b(\w+) \1\b finds duplicate words if they do not start with "-in".


http://www.vogella.com/articles/JavaRegularExpressions/article.html

本项目是一个基于SSM(Spring+SpringMVC+MyBatis)后端框架与Vue.js前端框架开发的疫情居家办公系统。该系统旨在为居家办公的员工提供一个高效、便捷的工作环境,同时帮助企业更好地管理远程工作流程。项目包含了完整的数据库设计、前后端代码实现以及详细的文档说明,非常适合计算机相关专业的毕设学生和需要进行项目实战练习的Java学习者。 系统的核心功能包括用户管理、任务分配、进度跟踪、文件共享和在线沟通等。用户管理模块允许管理员创建和管理用户账户,分配不同的权限。任务分配模块使项目经理能够轻松地分配任务给团队成员,并设置截止日期。进度跟踪模块允许员工实时更新他们的工作状态,确保项目按计划进行。文件共享模块提供了一个安全的平台,让团队成员可以共享和协作处理文档。在线沟通模块则支持即时消息和视频会议,以增强团队之间的沟通效率。 技术栈方面,后端采用了Spring框架来管理业务逻辑,SpringMVC用于构建Web应用程序,MyBatis作为ORM框架简化数据库操作。前端则使用Vue.js来实现动态用户界面,搭配Vue Router进行页面导航,以及Vuex进行状态管理。数据库选用MySQL,确保数据的安全性和可靠性。 该项目不仅提供了一个完整的技术实现示例,还为开发者留下了扩展和改进的空间,可以根据实际需求添加新功能或优化现有功能。
本项目是一个基于SSM(Spring+SpringMVC+MyBatis)后端框架与Vue.js前端框架开发的网上球鞋竞拍系统。该项目旨在为球鞋爱好者提供一个便捷、高效的在线竞拍平台,用户可以在此平台上浏览、搜索、竞拍心仪的球鞋,并参与到各种有趣的竞拍活动中。 系统的主要功能包括用户注册登录、球鞋信息展示、竞拍活动创建与管理、实时竞拍以及交易安全保障等。用户可以通过注册账号后,浏览平台上发布的各类球鞋信息,包括品牌、型号、颜色、尺码以及当前竞拍状态等。系统支持用户创建和管理自己的竞拍活动,设定竞拍规则和时间,同时提供实时竞拍功能,确保公平、透明的交易过程。 在技术实现上,后端采用SSM框架进行开发,Spring负责业务逻辑层,SpringMVC处理Web请求,MyBatis进行数据库操作,保证了系统的稳定性和扩展性。前端则使用Vue.js框架,结合Axios进行数据请求,实现了前后端分离,提高了开发效率和用户体验。 数据库设计方面,系统采用了MySQL数据库,存储用户信息、球鞋信息、竞拍活动等数据,确保数据的安全性和完整性。此外,项目还包含了详细的文档资料,包括需求分析、系统设计、数据库设计以及测试报告等,为项目的实施和维护提供了有力的支持。 该项目不仅适合作为计算机相关专业学生的毕业设计题目,也适合Java学习者进行实战练习,通过在此基础上进行功能扩展和改进,可以进一步提升编程技能和项目管理能力。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值