当我们要处理一条长长的字符对象是,我们可以用Pattern类或者String内带的一些有用的正则表达式方法。
String含有正则表达式常用方法:split(),replace(),replaceAll(),matches()等等。
代码例子:
public class Test
{
public static void main(String[] args)
{
//要匹配的字符
String s="<div class=\"sTitle\">可能存在类似的问题</div><div class=\"sFooter\"><a class=\"sFirstNewAsk\">我想提一个新问题</a></div>";
//匹配中文的正则表达式
String rege="[\u4e00-\u9fa5]+";
//获得HTML的标签和属性
System.out.println(Arrays.toString(s.split(rege)));
//替换第一段中文内容
System.out.println(s.replaceFirst(rege, "beyondboy"));
//替换所有的中文内容
System.out.println(s.replaceAll(rege, "scau"));
//只判断开头是否含有中文内容
System.out.println(s.matches(rege));
}
}
运行结果:
[<div class="sTitle">, </div><div class="sFooter"><a class="sFirstNewAsk">, </a></div>]
<div class="sTitle">beyondboy</div><div class="sFooter"><a class="sFirstNewAsk">我想提一个新问题</a></div>
<div class="sTitle">scau</div><div class="sFooter"><a class="sFirstNewAsk">scau</a></div>
false
面对一些复杂的文本内容处理我们可以结合Pattern和Matcher来使用,常用方法:find(),group(),start(),end(),matches(),lookingAt(),appendReplacement(),appendTail(),reset()等等。
代码例子:
public class Test
{
public static void main(String[] args)
{
//要匹配的字符
String s="58.27.82.161@02/10/2005\n" +
"204.45.234.40@02/11/2005\n" +
"58.27.82.161@02/11/2005\n" +
"58.27.82.161@02/12/2005\n" +
"58.27.82.161@02/12/2005\n" +
"[Next log section with different data format]";
//匹配邮箱的正则表达式
String regex="(\\d+[.]){3}\\d+@(\\d{2}/){2}\\d+";
Pattern pattern=Pattern.compile(regex);
System.out.println(pattern.split(regex));
//获得该matcher对象
Matcher matcher=pattern.matcher(s);
while (matcher.find())
{
//获得与(\\d+[.]){3}\\d+@(\\d{2}/){2}\\d+匹配的内容
System.out.println("matcher.groud():"+matcher.group()+ "\" at positions " +
matcher.start() + "-" + (matcher.end() - 1));
//获得与(\\d+[.])匹配的内容
System.out.println("matcher.groud(1):"+matcher.group(1));
//获得与(\\d{2}/)匹配的内容
System.out.println("matcher.groud(2):"+matcher.group(2));
}
//下面是比较matches()和lookingAt()区别
Matcher matcher2=test("beyondboy from scau","\\w+yond");
//lookingAt()只要前面第一部分匹配就会成功
System.out.println(matcher2.lookingAt());
//matches()只有整个字符内容匹配才能成功
System.out.println(matcher2.matches());
matcher2=test("beyondboy from scau",".*yond.*");
System.out.println(matcher2.lookingAt());
System.out.println(matcher2.matches());
//appendReplacement()和appendTail()方法的使用
Matcher matcher3=test("12345789123456987654","345");
StringBuffer buffer=new StringBuffer();
while (matcher3.find())
{
//添加匹配的前面部分的内容
matcher3.appendReplacement(buffer, "4");
System.out.println("matcher3.appendReplacement():"+buffer);
//添加没有匹配剩下的内容
matcher3.appendTail(buffer);
System.out.println("matcher3.appendTail():"+buffer);
}
buffer.setLength(0);
matcher3.reset();
//这两种方法一般是这样结合用的
while(matcher3.find())
matcher3.appendReplacement(buffer, "4");
matcher3.appendTail(buffer);
System.out.println("matcher3.appendTail():"+buffer);
}
private static Matcher test(String s,String regex)
{
Pattern pattern=Pattern.compile(regex);
Matcher matcher=pattern.matcher(s);
return matcher;
}
}
运行结果:
[Ljava.lang.String;@d0da1d8
matcher.groud():58.27.82.161@02/10/2005” at positions 0-22
matcher.groud(1):82.
matcher.groud(2):10/
matcher.groud():204.45.234.40@02/11/2005” at positions 24-47
matcher.groud(1):234.
matcher.groud(2):11/
matcher.groud():58.27.82.161@02/11/2005” at positions 49-71
matcher.groud(1):82.
matcher.groud(2):11/
matcher.groud():58.27.82.161@02/12/2005” at positions 73-95
matcher.groud(1):82.
matcher.groud(2):12/
matcher.groud():58.27.82.161@02/12/2005” at positions 97-119
matcher.groud(1):82.
matcher.groud(2):12/
true
false
true
true
matcher3.appendReplacement():124
matcher3.appendTail():124789123456987654
matcher3.appendReplacement():124789123456987654789124
matcher3.appendTail():1247891234569876547891246987654
matcher3.appendTail():1247891246987654
Scanner类也可以用正则来控制内容输出,常用方法hasNext(),useDelimiter(),next()方法。
代码例子:
public class Test
{
public static void main(String[] args)
{
//要匹配的字符
String s="<<<beyondboy>>>>> from >>scau ";
Scanner scanner=new Scanner(s);
scanner.useDelimiter("(<* *>* *)");
//获得英语字符
while(scanner.hasNext())
{
System.out.println(scanner.next());
}
//要匹配的字符
String threatData =
"58.27.82.161@02/10/2005\n" +
"204.45.234.40@02/11/2005\n" +
"58.27.82.161@02/11/2005\n" +
"58.27.82.161@02/12/2005\n" +
"58.27.82.161@02/12/2005\n" +
"[Next log section with different data format]";
String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@" +
"(\\d{2}/\\d{2}/\\d{4})";
Scanner scanner2=new Scanner(threatData);
while(scanner2.hasNext(pattern))
{
//获得匹配的内容
System.out.println(scanner2.next(pattern));
MatchResult match = scanner2.match();
String ip = match.group(1);
String date = match.group(2);
//分别输出(\\d{2}/\\d{2}/\\d{4}),(\\d+[.]\\d+[.]\\d+[.]\\d+)的内容
System.out.format("Threat on %s from %s\n", date,ip);
}
}
}
运行结果:
b
e
y
o
n
d
b
o
y
f
r
o
m
s
c
a
u
58.27.82.161@02/10/2005
Threat on 02/10/2005 from 58.27.82.161
204.45.234.40@02/11/2005
Threat on 02/11/2005 from 204.45.234.40
58.27.82.161@02/11/2005
Threat on 02/11/2005 from 58.27.82.161
58.27.82.161@02/12/2005
Threat on 02/12/2005 from 58.27.82.161
58.27.82.161@02/12/2005
Threat on 02/12/2005 from 58.27.82.161