java网页爬虫正则,正则表达式--——网页爬虫

最新推荐文章于 2023-01-14 23:17:46 发布

Paula-柒月拾

最新推荐文章于 2023-01-14 23:17:46 发布

阅读量87

点赞数

文章标签： java网页爬虫正则

网页爬虫import java.net.*;

import java.io.*;

import java.util.regex.*;

class findMail

{

public static void main(String[] args) throws Exception

{

//读取流关联文件

//BufferedReader bin = new BufferedReader(new FileReader("mail.txt"));

//获取网页上的数据需要获取输入流来自网页端的 URLConnection的getInputStream()来获取输入流

URL url = new URL("http://127.0.0.1:8080/myweb/mail.html");

URLConnection conn = url.openConnection();

BufferedReader bin = new BufferedReader(new InputStreamReader(conn.getInputStream()));

String line = null;

//定义邮箱格式正则规则

String mailreg = "\\w{2,13}@\\w{2,5}(\\.[a-z]+)+";

//正则规则封装模式对象

Pattern p = Pattern.compile(mailreg);

while ((line = bin.readLine())!=null)

{

Matcher m = p.matcher(line);//将模式与字符串关联

if (m.find())

{

System.out.println(m.group());

}

//System.out.println(line);

}

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注