正则表达式知识详解系列,通过代码示例来说明正则表达式知识
源代码下载地址:http://download.csdn.net/detail/gnail_oug/9504094
示例功能:
1、根据一个url,获取页面里的所有邮箱地址
步骤一、根据url读取页面内容
/**
* 根据url读取网页内容
* @date 2016-04-27 10:34:13
* @author sgl
* @param urlStr
* @return
*/
public static String readHtml(String urlStr){
StringBuffer sb=new StringBuffer("");
BufferedReader br=null;
try {
URL url=new URL(urlStr);
HttpURLConnection conn=(HttpURLConnection)url.openConnection();
InputStream in=conn.getInputStream();
br=new BufferedReader(new InputStreamReader(in));
String line=null;
while((line=br.readLine())!=null){
sb.append(line);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally{
if(br!=null){
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return sb.toString();
}
步骤二、从页面内容中找出邮箱地址
/**
* 从字符串中找出邮件地址
* @date 2016-04-27 10:35:27
* @author sgl
* @param str
* @return
*/
public static List<String>findEmail(String str){
Pattern p=Pattern.compile("[\\w-]+@[\\w\\.-]*\\w+\\.\\w{2,6}");
Matcher m=p.matcher(str);
List<String>list=new ArrayList<String>();
while(m.find()){
list.add(m.group());
}
return list;
}
步骤三、获取邮箱地址
public static void main(String[] args) {
String htmlTxt=Demo02.readHtml("http://blog.sina.com.cn/s/blog_515617e60101e151.html");
List<String>list=Demo02.findEmail(htmlTxt);
for(String email:list){
System.out.println(email);
}
System.out.println(list.size());
}
运行结果:(中间部分省略了)
530180782@qq.com
243678025@qq.com
398018489@qq.com
....
398018489@qq.com
595064131@qq.com
362483245@qq.com
285340035@qq.com
448280012@qq.com
438