黑马程序员——JAVA基础---正则表达式---概述，匹配、切割、替换、提取，网页爬虫

最新推荐文章于 2021-02-15 20:27:40 发布

星空下的开普勒

最新推荐文章于 2021-02-15 20:27:40 发布

阅读量499

点赞数

本文链接：https://blog.csdn.net/tian_wang/article/details/44570707

版权

-----------android 培训、java培训、java学习型技术博客、期待与您交流！------------

第一讲.概述

正则表达式：符合一定规则的表达式

用于专门操作字符串
用一些特定的符号来代表一些代码操作，简化书写
学习正则表达式，就是学习一些特殊符号的使用
好处：简化对于字符串的复杂操作
弊端：符号越多，表达式越长，阅读性差

第二讲.正则表达式的作用

匹配：String matches(String reg)方法，用相应匹配字符串，若有一处不相同则结束并返回false

public static void main(String[] args) 
	{
		matchesDemo("13405","[1-9]\\d{4,14}");   //QQ号的检查，共5~15个数字，首数字不为0
		matchesDemo("f7","[a-zA-Z]\\d?");	     //第一位为字母，第二位为数字(0或1次)
		/*字符类 这里有一些摘自JDK的常用规则
		[abc] a、b 或 c（简单类） 
		[^abc] 任何字符，除了 a、b 或 c（否定） 
		[a-zA-Z] a 到 z 或 A 到 Z，两头的字母包括在内（范围） 
		[a-d[m-p]] a 到 d 或 m 到 p：[a-dm-p]（并集） 
		[a-z&&[def]] d、e 或 f（交集） 
		Greedy 数量词 
		X? X，一次或一次也没有 
		X* X，零次或多次 
		X+ X，一次或多次 
		X{n} X，恰好 n 次 
		X{n,} X，至少 n 次 
		X{n,m} X，至少 n 次，但是不超过 m 次 */
<span style="white-space:pre">	</span>}
private static void matchesDemo(String str,String reg){
		System.out.println(str.matches(reg));
	}

切割：String split(String reg)方法

public static void main(String[] args) 
	{
	//splitDemo("a   sd   ff"," +");         //按照空格(任意个)切割
		//splitDemo("df.fsf.dsfe.S","\\.");		 //按“.”切割，由于“.”在正则表达式中表达任意字符，需用"\\."
		//splitDemo("c:\\txt\\a.txt","\\\\");    //按照“\\”切割
		//splitDemo("adkkfeqqfizzo","(.)\\1");     //按照叠词(两个)完成切割
		//splitDemo("adkkkkkkfeqqfizzzo","(.)\\1+");     //按照叠词(多个)完成切割
<span style="white-space:pre">	</span>}
private static void splitDemo(String str,String reg){
		String[] strs = str.split(reg);
		System.out.println(strs.length);
		for(String item:strs){
			System.out.println(item);
		}
	}

替换：String replaceAll(String reg，String str)方法

public static void main(String[] args) 
	{
		//String str = "wer23132395034ty23423423iod234523434";
		//replaceAllDemo(str,"\\d{5,}","#");//将数组替换为#
		//String str1 ="adkkkkkkfeqqfizzzo";
		//replaceAllDemo(str1,"(.)\\1+","$1");//将叠词换为单个字母 zzzz-->z，注意“组”的用法
	}
private static void replaceAllDemo(String str,String reg,String newStr){
		str = str.replaceAll(reg,newStr);
		System.out.println(str);
	}

获取：将字符串中符合规则的子串取出

private static void getDemo(){
		String str ="ming tian jiu yao fang jia le, da jia";
		String reg = "\\b[a-zA-Z]{4}\\b";//3个字母组成的单词，\b表示文字边界

		//将规则封装成对象
		Pattern p = Pattern.compile(reg);
		//关联字符串与规则，获取匹配器对象
		Matcher m = p.matcher(str);//String 中的matches、repaceAll,其实底层是用Pattern、Matcher完成的
		while(m.find())//查找，索引后移
		{
			System.out.println(m.group());//取得由以前匹配操作所匹配的输入子序列
			System.out.println(m.start()+"...."+m.end());
		}
	}

练习——三个小练习

import java.util.*;
import java.util.regex.*;
class RegTest 
{
	public static void main(String[] args) 
	{
		//test_1();
		test_2();
	}
	//练习一：将下列字符串转成：我要学编程。
	public static void test_1(){
		String str = "我我..我我...我要....要要..要要..学学学..学学...编编编..编程..程..程程...程..程";
		String reg = "[^\\.]";
		Pattern p = Pattern.compile(reg);
		StringBuilder sb = new StringBuilder();
		Matcher m = p.matcher(str);
		while(m.find())
			sb.append(m.group());
		str = sb.toString();
		//str = str.replaceAll("\\.+","");//可以实现删除的。。
		str = str.replaceAll("(.)\\1+","$1");
		System.out.println(str);
	}
	//练习二：ip按照字段排序。思路是通过补零的方法，通过字符串自然排序就可以了，很巧妙。灵活运用了“组”这个概念
	public static void test_2(){
		String ip = "192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.30";
		ip = ip.replaceAll("(\\d+)","00$1");
		ip = ip.replaceAll("0*(\\d{3})","$1");
		String[] strs = ip.split(" ");
		Arrays.sort(strs);
		ip = Arrays.toString(strs);
		ip = ip.substring(1,ip.length()-1);//去掉[]
		ip = ip.replaceAll("0{1,2}(\\d)","$1");
		System.out.println(ip);
	}
	//练习三：检查邮箱正确性，还是很有实用性的。。
	public static void checkMail(){
		String mail = "abc123@126.com.cn";
		String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]){1,3}";
		//String reg = "\\w+@\\w+(\\.\\w+)+";//不太精确的匹配
		System.out.println(mail.matches(reg));
	}
}

第三讲.网页爬虫(蜘蛛)——正则表达式的应用，可以爬邮箱等等的数据

import java.io.*;
import java.util.regex.*;
class Spider 
{
	public static void main(String[] args) throws IOException
	{
		getMails();
	}
	public static void getMails() throws IOException
	{
		BufferedReader in = new BufferedReader(new InputStreamReader("mail.txt"));//从文本文件中获取
		String mailReg = "\\w+@\\w+(\\.\\w+)+";
		Pattern p = Pattern.compile(mailReg);String line;
		while((line = in.readLine())!=null){
			Matcher m = p.matcher(line);
			while(m.find())
				System.out.println(m.group());
		}
		in.close();
	}
}

星空下的开普勒

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
黑马程序员——JAVA基础---正则表达式---概述，匹配、切割、替换、提取，网页爬虫

-----------android培训、java培训、java学习型技术博客、期待与您交流！------------第一讲.概述正则表达式：符合一定规则的表达式用于专门操作字符串用一些特定的符号来代表一些代码操作，简化书写学习正则表达式，就是学习一些特殊符号的使用好处：简化对于字符串的复杂操作弊端：符号越多，表达式越长，阅读性差第二讲.正则表达式的作用
复制链接

扫一扫