- 字符串匹配 match
- 字符串查找 find
- 字符串替换 replace
相关类:
java.lang.String
- matches() 匹配整个字符串,而不是字符串的子串
- find() 查找符合模式的子串,下次匹配是从不匹配的字符那个位置开始匹配
- lookingAt() 每次匹配都是从字符串起始位置开始,与matches不同的是,它可以只匹配子串
- replaceAll() 替换整个字符串中和正则表达式匹配的字符
- split() 使用正则表达式来匹配分隔符
java.util.regex.Pattern 要匹配的模式
java.util.regex.Matcher 匹配之后产生的结果
java.util.regex.PatternSyntaxException
MetaCharacters:
- .一个字符,不包含换行符
- +一个或多个
- ?0个或一个
- * 0个或多个
- {n} {n,} {n,m}
- \d表示一个数字,与[0-9]具有相同的效果
- []表示在一个范围内获取一个字符,在括号中可以使用一些逻辑运算符^、|、&&
[a-zA-Z] [a-z]|[A-Z]
预定义字符:
\d
\D
\s
\S
\w = [0-9a-zA-Z_]
\W = [^\w]
边界:
^ 行开头
$ 行结束
\b 匹配单词的边界,前一个字符和后一个字符不全是\w
\B 表示非单词边界
分组:
IP地址的匹配 - ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)
每个分组会自动拥有一个组号,0对应整个正则表达式,从左到右,为未分配组名分配组号(从1开始),第二标给命名组分配组号,可以使用(?:exp)语法来剥夺一个分组对组号分配的参与权
- (?=exp)
- (?<=exp)
- (?!exp)
断言此位置的后面不能匹配exp,只是匹配一个位置,而不吞掉这个字符
- (?<!exp)
断言此位置的前面不能匹配exp
例子:
匹配tag - (?<=<(\w+)>).*(?=<\/\1>)
贪婪与懒惰:
aabab
a.*b会贪婪匹配整个字符串,而懒惰匹配会匹配尽量少的字符,如a.*?b去匹配,会返回aab
- greedy,没有匹配上会让步(回吐)
- reluctant,?懒惰匹配
- possessive,+没有匹配上也不让步
Practices:
1. 基础复习
package TestExpression;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static void main(String[] args) {
System.out.println("lalala$$lala".replaceAll("$", "~"));
System.out.println("lalala$$lala".replaceAll("\\$", "~"));
System.out.println(Arrays.asList("192.168.1.1".split("\\.")));
System.out.println("192.168.1.1".split("\\."));
System.out.println("curry2".matches(".+\\d")); //双反斜杠完成转义
Pattern p = Pattern.compile("[a-z]{3}");
Matcher m = p.matcher("ab3");
System.out.println(m.matches()); //返回布尔值来判断它是否匹配
"ab3".matches("[a-z]{3}");
System.out.println("192.168.0.224".matches("(\\d{1,3}\\.){3}\\d{1,3}"));
System.out.println(" \n\r\t".matches("\\s{4}"));
System.out.println("|".matches("\\|"));
System.out.println("&&".matches("&\\&"));
System.out.println("b".matches("a&&b")); //&&返回交集
System.out.println("\\".matches("\\\\"));
Pattern pattern = Pattern.compile("\\d{3,5}");
Matcher matcher = pattern.matcher("127-123-1981-092-98461-00");
//System.out.println(matcher.matches());
//matcher.reset();
if(matcher.lookingAt())
System.out.println(matcher.start()+","+matcher.end());
if(matcher.find()) //从不匹配位置开始查找,查找是否有匹配的字符串
System.out.println(matcher.start()+","+matcher.end());
if(matcher.find())
System.out.println(matcher.start()+","+matcher.end());
//java.lang.IllegalStateException:No match available
/*System.out.println(matcher.find());
System.out.println(matcher.start()+","+matcher.end());
*/
matcher.reset();
System.out.println(matcher.replaceAll("*")); //String类的replaceAll就是调用Matcher的replaceAll方法
/*public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}*/
/*public String replaceString(String source, String regex, String replacement, int flags){
Pattern pattern = Pattern.compile(regex, flags);
Matcher matcher = pattern.matcher(source);
StringBuffer buffer = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(buffer, replacement);
// other operations
}
matcher.appendTail(buffer);
return buffer.toString();
}*/
System.out.println("hello hello".matches("(\\b\\w+\\b).\\1"));
System.out.println("hello world".matches("(\\b\\w+\\b).\\1"));
System.out.println("hello hello".matches("(?<P1>\\b\\w+\\b).\\k<P1>")); //可以把<P1>替换成'P1'
//Pattern p2 = Pattern.compile("\\w+(?=ing)");
Pattern p2 = Pattern.compile("(?=ing)\\w+");
Matcher m2 = p2.matcher("singing");
if(m2.find())
System.out.println(m2.start()+","+m2.end());
//Pattern p3 = Pattern.compile("(?<=ad)\\w+");
Pattern p3 = Pattern.compile("\\w+(?<=ad)");
Matcher m3 = p3.matcher("reading");
if(m3.find())
System.out.println(m3.start()+","+m3.end());
Pattern p4 = Pattern.compile("(\\d{3,5})(\\w{3})");
Matcher m4 = p4.matcher("215vsdv6346cas534sdd");
while(m4.find()){
System.out.println("group : " + m4.group());
System.out.println("group1 : " + m4.group(1)+";" + " group2 : " + m4.group(2));
}
System.out.println(m4.groupCount());
//compile(regex, flags) flags是一些常量,比如大小写不敏感
}
}
2. 邮箱抓取
"[\\w.-]+@[\\w.-]+\\.\\w+"
3. 代码统计
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Pattern;
public class CodeCounter {
private static final String path = "D:\\Eclipse\\Review\\src";
private static int whiteSpaces = 0;
private static int comments = 0;
private static int normalLines = 0;
public static void main(String[] args) {
String line;
File f = new File(path);
if(f.exists() && f.isDirectory()) {
listFiles(f);
} else if(f.exists() && f.isFile() && f.getName().matches(".+\\.java$")) {
count(f);
} else {
System.out.println("file doesn't exist!");
System.exit(-1);
}
System.out.println("whitespace lines: "+ whiteSpaces);
System.out.println("comment lines: "+ comments);
System.out.println("normal lines: "+ normalLines);
}
private static void count(File f) {
boolean comment = false;
try (BufferedReader br = new BufferedReader(new FileReader(f))) {
String line = "";
while((line = br.readLine()) != null) {
line = line.trim();
if(line.matches("[\\s&&[^\\n]]*$")) { //readLine自动去掉换行符
whiteSpaces++;
continue;
}
if((line.startsWith("/*") && line.endsWith("*/")) || line.matches("//")) {
comments++;
continue;
}
if(line.startsWith("/*") && !line.endsWith("*/")) {
comment = true;
comments++;
continue;
}
if(comment) { //true == comment
comments++;
if(line.endsWith("*/"))
comment = false;
continue;
}
normalLines++;
}
} catch (FileNotFoundException e) {
System.out.println("file not found!");
e.printStackTrace();
} catch (IOException e1) {
System.out.println("something wrong when handling the file");
e1.printStackTrace();
}
}
private static void listFiles(File f) {
File[] files = f.listFiles();
for(File file : files) {
if(file.isDirectory())
listFiles(file);
else if(file.getName().matches(".+\\.java$"))
count(file);
}
}
}
Reference:
1. http://m.blog.csdn.net/article/details?id=51107412
2. http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html
3. https://docs.oracle.com/javase/7/docs/api/