正则表达式总结

最新推荐文章于 2022-02-20 22:22:58 发布

雪山飞狐YCH

最新推荐文章于 2022-02-20 22:22:58 发布

阅读量1.3k

点赞数 1

分类专栏： JavaSE

本文链接：https://blog.csdn.net/YCH1035235541/article/details/24809529

版权

JavaSE 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

正则表达式

1、初步认识. * + ? 等元字符

. 表示一个任意字符

* 表示0个或多个任意字符

+ 表示1个或多个任意字符

？表示0个或1个任意字符

示例：

                   print("aaa".matches("a."));//false
                   print("aaa".matches("a*"));//true
                   print("aaa".matches("a+"));//true
                   print("aaa".matches("a?"));//false
                   print("".matches("a."));//false
                   print("".matches("a*"));//true
                   print("".matches("a+"));//false
                   print("".matches("a?"));//true
                   print("hax".matches("h.*");//trueh后边有0或多个任意字符

2、x{} x* x+ x? 的用法

X{}表示前边一个字符模式重复的次数

X{n,} 表示>=n次

X{n,m} 表示n<=x<=m次

X{n} 表示恰好n次

X* 表示出现0-无数次

X+ 参照元字符的用法

注意：这里的X可以是.

3、[]表示一个字符的限定

[A-Z] 表示A-Z中的任一个

[a-z] 表示a-z中的任一个

[0-9] 同上

注：这里-表示范围间隔符

例如：

                   print("aa".matches("a{2,5}"));//true
                   print("446538446345".matches("\\d{3,100}"));//true
                   print("192.168.40.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));//false
                   print("192".matches("[0-2][0-9][0-9]"));//true
                   print("a".matches("[abc]"));//true
                   print("a".matches("[^abc]"));//false
                   print("a".matches("[a-z]"));//true
                   print("a".matches("[b-z]"));//false
                  
                   print("a".matches("[a-zA-Z]"));//true,一下三个意思相同
                   print("A".matches("[a-z]|[A-Z]"));//true
                   print("A".matches("[a-z[A-Z]]"));//true
                   print("A".matches("[A-Z&&[RFG]]"));//false

4、其他元字符 \s \S \w \W\\

\d 表示数字：[0-9]

\D 表示非数字： [^0-9]

\s 表示空白字符：[ \t\n\x0B\f\r]

\S 表示非空白字符：[^\s]

\w 表示单词字符：[a-zA-Z_0-9]

\W 表示非单词字符：[^\w]

\ 转义字符，例如\.表示字符. \\表示字符\

实例：

                   print("\r\n\t".matches("\\s{4}"));//true
                   print("".matches("\\S"));//false
                   print("a_8".matches("\\w{3}"));//true
                   print("abc888&^%".matches("[a-z]{1,3}\\d+[#&^%]+"));//true
                   print("\\".matches("\\\\"));//true

5、边界匹配器

^ 表示行的开头

$ 表示行的结尾

\b 表示单词边界

\B 表示非单词边界

实例：

                   print("hellosir".matches("^h.*"));//
                   print("h".matches("^h.*"));//true  h开头,后边有任意个任意字符
                   print("hellosir".matches(".*ir$"));//true 以ir结尾
                   print("hellosir".matches(".*r$"));//true 以r结尾

注意：注意^出现在第一个字符位置是表示开头,出现在[]里边的开头位置表示取非，出现在字符串中间表示^字符本身

6、Matcher 的matches()、find()、reset()、lookingAt()方法

Mathes() 方法是完整匹配

Find() 是找子串匹配，从上一次匹配结束的位置开始匹配

lookingAt() 是前端匹配

reset() 重置匹配指针

注意：matches,find,lookingAt都会影响指针，并且可能影响下一次匹配（lookingAt除外）

实例：

                   Patternp = Pattern.compile("\\d{3,5}");
                   Strings = new String("123-4568-12-58");
                   Matcherm = p.matcher(s);
                  
                   print(m.matches());
                   //print(m.matches());
                   m.reset();
                   print(m.find());//find和matches都会影响指针，reset会重置指针
                   print(m.start()+ "-" + m.end());
                   print(m.find());
                   print(m.start()+ "-" + m.end());//start()返回匹配子串第一个字符下标，end()返回子串最后字符的下一个字符下标
                   print(m.find());
                   //print(m.start()+ "-" + m.end());//find成功才可以输出start() 和 end(),否则出错！
                  
                   print(m.lookingAt());
                   print(m.lookingAt());
                   print(m.lookingAt());//每次都从头部开始匹配

7、字符串替换

实例：

                   Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
                   Strings = "java Java Java JaVa jAva JAVA IloveJava HeHeateJava_javaadajerklase";
                   Matcherm = p.matcher(s);
                   while(m.find()){
                            print(m.group());
                   }
                   print(m.replaceAll("JAVA"));//将匹配的模式全部替换成JAVA
                  
                   //将第奇数个匹配替换成java,偶数个匹配替换成JAVA
                   m.reset();
                   inti = 0;
                   StringBuffersb = new StringBuffer();
                   while(m.find()){
                            i++;
                            if(i%2==0){
                                     m.appendReplacement(sb,"JAVA");
                            }else{
                                     m.appendReplacement(sb,"java");
                            }
                   }
                   m.appendTail(sb);//复制剩余的为匹配序列
                   print(sb);

8、分组group()

正则表达式用()对模式进行分组，组号从0开始，因为()可以嵌套使用，因此，分组k即从左往右数第k个左括号所在的括号的模式字符。

实例：

                  Patternp = Pattern.compile("(\\d{3,5}([A-Z]{2,3}))([a-z]{2})");
                   Strings = "234MHaa-37928XTxy-7973QWEdd-xxxx";
                   Matcherm = p.matcher(s);
                   for(inti=0; i<=m.groupCount(); i++){
                            print("------"+ i + "---------");
                            m.reset();
                            while(m.find()){
                                     print(m.group(i));//即第三个个左括号所在括号限定的模式，组号从0开始
                            }
                   }
输出为：
------0---------
234MHaa
37928XTxy
7973QWEdd
------1---------
234MH
37928XT
7973QWE
------2---------
MH
XT
QWE
------3---------
aa
xy
dd

9、练习：抓取网页上的所有邮箱

BufferedReader br = null;
                   try{
                            br= new BufferedReader(new FileReader("Test.htm"));
                   }catch (FileNotFoundException e1) {
                            //TODO Auto-generated catch block
                            e1.printStackTrace();
                   }
                   StringBuffersb = new StringBuffer();
                   Stringline = null;
                   try{
                            while((line= br.readLine()) != null){
                                     sb.append(line);
                            }
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
                   //print(sb);
                   //Patternp = Pattern.compile("[\\w\\._\\-]{6,18}@[\\w]+\\.[\\w\\.]+");
                   Patternp = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
                   Matcher m = p.matcher(sb);
                   while(m.find()){
                            print(m.group());
                   }
         }

10、统计D:\JavaLearnTest\src目录下的.java文件的代码行

package RegExp;
import java.io.*;
public class FileLineCount {
	public static int whiteLines = 0;
	public static int normalLines = 0;
	public static int commentLines = 0;
	public static void main(String[] args){
		File f = new File("D:/JavaLearnTest/src");
		lineCount(f);
		System.out.println("withLines:" + whiteLines);
		System.out.println("commentLines:" + commentLines);
		System.out.println("normalLines:" + normalLines);
	}
	private static void lineCount(File f){
		File[] files = f.listFiles();
		for(int i=0; i<files.length; i++){
			if(files[i].isDirectory()) lineCount(files[i]);
			else {
				if(files[i].getName().matches(".+\\.java$")){
					parse(files[i]);
				}
			}
		}
	}

	private static void parse(File file) {
		BufferedReader br = null;
		String line = "";
		boolean flag = false;
		int i = 0;
		try {
			br = new BufferedReader(new FileReader(file));
			while((line = br.readLine())!=null){
				i++;
				if(flag){
					commentLines ++;
				}else{
					if(line.matches("^[\\s&&[^\\n]]*$")){//因为readLine()方法读入时会去掉换行符，所以不用\\n$
						whiteLines ++;
					}else{
						if(line.trim().startsWith("/*")){
							commentLines ++;
							//System.out.println("第" + i + "行，注释开始");
							flag = true;
						}else if(line.trim().startsWith("//")){
							commentLines ++;
						}else{
							normalLines ++;
						}
					}
				}
			
				if(line.trim().endsWith("*/") && !line.trim().startsWith("//")){
					System.out.println("第" + i + "行，注释结束");
					flag = false;
				}
			}
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e){
			e.printStackTrace();
		}
		
	}
}

雪山飞狐YCH

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
正则表达式总结

正则表达式1、初步认识. * + ? 等元字符. 表示一个任意字符* 表示0个或多个任意字符+ 表示1个或多个任意字符？表示0个或1个任意字符示例： print("aaa".matches("a."));//false print("aa
复制链接

扫一扫