今天花了一点时间抓取了网易云音乐的热门民谣歌单,共1500热门民谣歌单,后续有时间会爬取其他分类。
下面记录一下java爬取过程。见下:
爬虫过程
1.首先抓取各个歌单的url与标题
public static void DoPachong( String url_str, String charset) throws ClientProtocolException, IOException{
HttpClient hc = new DefaultHttpClient();
HttpGet hg = new HttpGet(url_str);
HttpResponse response = hc.execute(hg);
HttpEntity entity = response.getEntity();
InputStream htm_in = null;
if(entity != null){
htm_in = entity.getContent();
String htm_str = InputStream2String(htm_in,charset);
Document doc = Jsoup.parse(htm_str);
Elements links= doc.select("div[class=g-bd]").select("div[class=g-wrap p-pl f-pr]").select("ul[class=m-cvrlst f-cb]").select("div[class=u-cover u-cover-1");
for (Element link : link