Java爬虫:对猫眼电影进行信息采集并存入Excel
采集的目标以及内容
目标:TOP100榜,最受期待榜,热映口碑榜,国内票房榜,北美票房榜。
内容:图片,电影名,上映时间,主演人员,电影链接,电影评分,总点评人数,想看人数,已看人数。
用到的一些Maven依赖:
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.58</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.10</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.16</version>
</dependency>
提取的信息封装为一个类
public class Mao {
private String picLink;//电影图片链接
private String movie;//电影名
private String releaseTime;//上映时间
private String star;//参演人员
private String movieLink;//链接
private String score;//电影评分
private String snum;
private String watched;
private String num;
public Mao(String picLink,String movie,String releaseTime,String star,String movieLink,String score,String snum,String watched,String num){
this.picLink = picLink;
this.movie = movie;
this.releaseTime = releaseTime;
this.star = star;
this.movieLink = movieLink;
this.score = score;
this.snum = snum;
this.watched = watched;
this.num = num;
}
获得总点评人数,想看人数,已看人数
public List<String> getComment(String movieLink){
List<String> list = new ArrayList<>(3);
String movieId = movieLink.substring(movieLink.lastIndexOf("/")+1,movieLink.length());
String request = "http://m.maoyan.com/asgard/a