失业在宿舍已经6天了,在年底,不好找工作。上班之后,第一次跨年,没经验,以为工作好找。
闲的无聊,写个简单的爬虫
腾讯城市新闻爬虫练习
http://fj.qq.com/dc_column_article/TagsList.htm?tags=福州
爬虫抓包分析
GET请求,填写相应参数就行了。
核心代码:
// 福州新闻
HttpGet httpGet = new HttpGet(
"http://tags.open.qq.com/interface/tag/articles.php?callback=jQuery18202606978673085389_1417534641913&p="
+ i
+ "&l=20&tag=%E7%A6%8F%E5%B7%9E&oe=gbk&ie=utf-8&site=fj&_=1417534648230");
// 设置HttpGet的头部参数信息
httpGet.setHeader("Accept", "application/javascript, */*;q=0.8");
httpGet.setHeader("Accept-Charset", "GB2312,utf-8;q=0.7,*;q=0.7");
httpGet.setHeader("Accept-Encoding", "gzip, deflate");
httpGet.setHeader("Accept-Language", "zh-CN");
httpGet.setHeader("Connection", "Keep-Alive");
httpGet.setHeader("DNT", "1");
httpGet.setHeader(
"Cookie",
"pgv_info=ssid=s7418086336; ac=1,019,001; pt2gguin=o1023746826; RK=2dlDJvBBFu; ptcz=a7000fdd9a7d79d08c3356b93cd78e526ae2be2327ee843f9d94a415e5fb4a7f; pgv_pvid=4808498400; uin_cookie=1023746826; euin_cookie=9628486DA91468B319D8EB65692F106CB376CEC227892C37; o_cookie=1023746826");
// httpGet.setHeader("Host", "tags.open.qq.com");
httpGet.setHeader("Referer",
"http://js.qq.com/dc_column_article/TagsList.htm?tags=%E8%8B%8F%E5%B7%9E");
httpGet.setHeader("User-Agent",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)");