完整步骤如下:
1.提供一个URL
2.下载资源
3.分析数据(可以利用正则表达式)
4.数据抽取
5.数据清洗
6.存储
以下示例就只操作前两步。
简单示例(该网站默认允许爬取): 提供一个URL,下载资源
public static void main(String[] args) throws IOException {
URL url = new URL("https://www.jd.com");
InputStream is = url.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String msg = null;
while((msg = reader.readLine())!=null){
System.out.println(msg);
}
}
简单示例(该网站默认不允许爬取,可以浏览器模拟请求访问进行爬取): 提供一个URL 下载资源
public static void main(String[] args) throws IOException {
URL url = new URL("https://www.dianping.com");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36");
InputStream is = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String msg = null;
while((msg = reader.readLine())!=null){
System.out.println(msg);
}
}
关注公众号,可以免费获取毕业设计项目、各种免费软件、资料,笔记哦。