使用Java爬虫爬取蓝调口琴网口琴曲谱与伴奏资源

最新推荐文章于 2021-08-05 04:50:19 发布

ybqdren

最新推荐文章于 2021-08-05 04:50:19 发布

阅读量1.8k

点赞数 2

分类专栏： Java 学习总结大学积累文章标签： java

本文链接：https://blog.csdn.net/qq_43795348/article/details/113250175

版权

大学积累同时被 3 个专栏收录

81 篇文章 0 订阅

订阅专栏

学习总结

64 篇文章 1 订阅

订阅专栏

Java

39 篇文章 1 订阅

订阅专栏

一、写在前面

因为自己有蓝调口琴曲谱采集需求，于是就断断续续花了大概2~3天的时间写了这个爬虫。

目前只能采集蓝调口琴曲谱和伴奏音频，后续会慢慢添加文字教程与视频教程的爬取。

PS：这里我使用到了Cookie来获取查看权限，所以这个爬虫也只面向有会员权限的小伙伴使用。

二、细节介绍

1.登录

采用Cookie验证的方式登录：

	httpGet.setHeader("Cookie", prop.getProperty("Cookie"));
	httpGet.setHeader("User-Agent", prop.getProperty("User-Agent"));

2.资源爬取方式

采用的是资源链接获取->请求资源->下载资源的方式，以获取曲谱资源为例：

		// 获取到当前图片的地址 并重新发送请求
		url = imageMap.get("url");
		httpGet = new HttpGet(url);
		response = httpClient.execute(httpGet);
		httpEntity = response.getEntity();
		file_img = new File(rootDir+"/"+name+".png");
		if(file_img.exists()){
			file_img.delete();
		}
		BufferedOutputStream bw_img = new BufferedOutputStream(new FileOutputStream(file_img,true));
		byte[] byt = EntityUtils.toByteArray(httpEntity);
		bw_img.write(byt);
		bw_img.close();

3.使用到的Java包

主要是使用到了以下包

	<!-- 日志 -->  
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
    
    <!-- 爬虫包 -->
    <dependency>
		<groupId>org.jsoup</groupId>
		<artifactId>jsoup</artifactId>
		<version>1.11.3</version>
	</dependency>
    <dependency>
       <groupId>org.apache.httpcomponents</groupId>
       <artifactId>httpclient</artifactId>
       <version>4.5.5</version>
    </dependency>