爬取的内容为京东客服的微博及评论
思路:主要是通过手机端访问新浪微博的api接口,然后进行数据的筛选,
这个主要是登陆上去的微博的url链接,
也可以在
https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F
进行新浪微博的登陆,
可以看到的界面:
这里主要爬取的内容为:
说说,说说下面的评论条目
虽然很简单,但是,不得不说句mmp,爬取的过程很坎坷,现在是一直在ip上,另外,个人经过尝试,睡眠时间30秒一次也不是很好的效果, 睡眠10秒就足够了,可能该封你的ip还是会封的,我这问题应该封ip的情况
爬取的方法主要是通过手机端api进行json数据的获取,然后进行数据的提取。
这里可以使用火狐fox的插件使用:
主要api:
说说API:
类似于这样子的,
详情评论内容API:
在每条评论下会有一个idstr:4137390568546147
然后跳到评论详情页:
https://m.weibo.cn/status/4137390568546147
评论条目拼加方式:
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=1
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=2
带大家看一下评论api下返回的数据:JSON格式的
{
"cardlistInfo": {
"containerid": "1076035650743478",
"v_p": 42,
"show_style": 1,
"total": 3264,
"page": 2
},
"cards": [
{
"card_type": 9,
"itemid": "1076035650743478_-_4137858652321796",
"scheme": "https://m.weibo.cn/status/FfSSl9K0k?mblogid=FfSSl9K0k&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"mblog": {
"created_at": "2小时前",
"id": "4137858652321796",
"mid": "4137858652321796",
"idstr": "4137858652321796",
"text": "明天又要上班了,用四个字描述下你现在的心情吧<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span> ",
"textLength": 50,
"source": "微博 weibo.com",
"favorited": false,
"thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",
"bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",
"original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",
"user": {
"id": 5650743478,
"screen_name": "京东客服",
"profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",
"profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"statuses_count": 3245,
"verified": true,
"verified_type": 2,
"verified_type_ext": 0,
"verified_reason": "北京京东世纪贸易有限公司",
"description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",
"gender": "f",
"mbtype": 2,
"urank": 29,
"mbrank": 2,
"follow_me": false,
"following": false,
"followers_count": 18427,
"follow_count": 235,
"cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"
},
"reposts_count": 0,
"comments_count": 4,
"attitudes_count": 2,
"isLongText": false,
"visible": {
"type": 0,
"list_id": 0
},
"mblogtype": 0,
"bid": "FfSSl9K0k",
"pics": [
{
"pid": "006apWvQgy1fi7tkjguy4j309q09qt8q",
"url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",
"size": "orj360",
"geo": {
"width": "350",
"height": "350",
"croped": false
},
"large": {
"size": "large",
"url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",
"geo": {
"width": "350",
"height": "350",
"croped": false
}
}
}
]
},
"show_type": 0,
"openurl": ""
},
{
"card_type": 9,
"itemid": "1076035650743478_-_4137692553365577",
"scheme": "https://m.weibo.cn/status/FfOyre7xv?mblogid=FfOyre7xv&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"mblog": {
"created_at": "13小时前",
"id": "4137692553365577",
"mid": "4137692553365577",
"idstr": "4137692553365577",
"text": "你觉得举办哪种《中国有_____》比赛,你能进入决赛? ",
"textLength": 49,
"source": "微博 weibo.com",
"favorited": false,
"thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",
"bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",
"original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",
"user": {
"id": 5650743478,
"screen_name": "京东客服",
"profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",
"profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"statuses_count": 3245,
"verified": true,
"verified_type": 2,
"verified_type_ext": 0,
"verified_reason": "北京京东世纪贸易有限公司",
"description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",
"gender": "f",
"mbtype": 2,
"urank": 29,
"mbrank": 2,
"follow_me": false,
"following": false,
"followers_count": 18427,
"follow_count": 235,
"cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"
},
"reposts_count": 0,
"comments_count": 13,
"attitudes_count": 1,
"isLongText": false,
"visible": {
"type": 0,
"list_id": 0
},
"mblogtype": 0,
"bid": "FfOyre7xv",
"pics": [
{
"pid": "006apWvQgy1fi7ul9n9rfj30k00lsgnj",
"url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 392,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",
"geo": {
"width": "720",
"height": "784",
"croped": false
}
}
}
]
},
"show_type": 0,
"openurl": ""
},
{
"card_type": 9,
"itemid": "1076035650743478_-_4137390568546147",
"scheme": "https://m.weibo.cn/status/FfGHmzRf5?mblogid=FfGHmzRf5&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"mblog": {
"created_at": "昨天 14:24",
"id": "4137390568546147",
"mid": "4137390568546147",
"idstr": "4137390568546147",
"text": "周末就是买买买,吃吃吃<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_huaixiao-bb5966dcc6.png\" style=\"width:1em;height:1em;\" alt=\"[坏笑]\"></span> ",
"textLength": 28,
"source": "微博 weibo.com",
"favorited": false,
"thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7taijr9pg307e05kgvl.gif",
"bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7taijr9pg307e05kgvl.gif",
"original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif",
"user": {
"id": 5650743478,
"screen_name": "京东客服",
"profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",
"profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",
"statuses_count": 3245,
"verified": true,
"verified_type": 2,
"verified_type_ext": 0,
"verified_reason": "北京京东世纪贸易有限公司",
"description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",
"gender": "f",
"mbtype": 2,
"urank": 29,
"mbrank": 2,
"follow_me": false,
"following": false,
"followers_count": 18427,
"follow_count": 235,
"cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"
},
"reposts_count": 0,
"comments_count": 19,
"attitudes_count": 1,
"isLongText": false,
"visible": {