正则表达式学习笔记
(一)正则表达式:符合一定规则的表达式。作用:用于专门操作字符串。
(二)匹配手机号:
package cn.cx.regex.study;
publicclass Demo01 {
publicstaticvoid main(String[] args) {
String str = "14823451230";
String reg = "[1][358]\\d{9}";
System.out.println(str.matches(reg));
}
(三)分割和替换:
package cn.cx.regex.study;
publicclass SplitDemo {
publicstaticvoid main(String[] args) {
//splidDouble();
replaceDouble();
}
publicstaticvoid split01() {
String str = "c:\\asd\\add\\adf\\ss";
String reg = "\\\\";
String[] arr = str.split(reg);
System.out.println(arr.length);
for(String s : arr) {
System.out.println(s);
}
}
publicstaticvoid splidDouble() {
String str = "sdgdsssbnjjfvfggeo";
String reg = "(.)\\1+";//切割叠词,括号里的是组,将规则封装成一组。
//组的出现有编号,从1开始。想要使用已有的组用\n获取
String[] arr = str.split(reg);
System.out.println(arr.length);
for(String s : arr) {
System.out.println(s);
}
}
publicstaticvoid replaceDouble() {
String str = "sdgdsssbnjjfvfggeo";
String reg = "(.)\\1+";
str = str.replaceAll(reg, "$1");//$1意思是拿前面正则的第一组来替换。将叠词替换成一个。
System.out.println(str);
}
(四)获取:将字符串中符合规则的子串取出。1,将正则表达式封装成对象 2,让正则对象和要操作的字符串相关联 3,关联后获取正则匹配引擎 4,通过引擎对符合规则的子串进行操作,如取出。
package cn.cx.regex.study;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
publicclass PatternDemo {
publicstaticvoid main(String[] args) {
getSubString();
}
publicstaticvoid getSubString() {
String str = "ming tian jiu yao fang jia la haha ooo";
String regex = "\\b[\\w]{3}\\b";//"\b"为匹配单词边界
//将规则封装成对象
Pattern p = Pattern.compile(regex);
//让规则对象和要作用的字符串相关联,获取匹配对象。
Matcher m = p.matcher(str);
//将规则作用到字符串上,并进行符合规则的子串查找
while(m.find()) {
//获取匹配后的结果
System.out.println(m.group());
}
}
(五)练习一:获取“我要学编程”
package cn.cx.regex.study;
publicclass Test01 {
publicstaticvoid main(String[] args) {
String str = "我我..我我..我我..我要..要.要.要.要.要...学学学.学.编编编程..程程程";
str = str.replaceAll("\\.+", "");
System.out.println(str);
str = str.replaceAll("(.)\\1+", "$1");
System.out.println(str);
}
}
(六)匹配邮箱:
package cn.cx.regex.study;
publicclass MatchMail {
publicstaticvoid main(String[] args) {
String mail = "124dgg@hjgd12.com.cn.cv.cv";
String regex = "[a-zA-Z0-9]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+){1,3}";
System.out.println(mail.matches(regex));
}
}
(七)网页爬虫:
package cn.cx.regex.study;
import java.net.*;
import java.io.*;
import java.util.regex.*;
publicclass GetMail {
publicstaticvoid main(String[] args) throws Exception {
URL url = new URL("http://localhost:8080/aa/1.html");
URLConnection uc = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
Pattern p =Pattern.compile("[a-zA-Z0-9]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+){1,3}");
String line = null;
while((line = br.readLine())!=null) {
Matcher m = p.matcher(line);
while(m.find()) {
System.out.println(m.group());
}
}
}