使用Scrapy抓取新浪微博用户信息

详细代码可查看Knowsmore

数据的来源是新浪微博的手机端H5页面

个人资料API: https://m.weibo.cn/profile/in...【用户ID】

发出的微博API:https://m.weibo.cn/api/contai...【用户ID】_-_WEIBO_SECOND_PROFILE_WEIBO&page_type=03&page=【页数从1开始】

# -*- coding: utf-8 -*-
import scrapy
import re
import json
import os,sys
from scrapy import Selector, Request
from knowsmore.items import WeiboUserItem, WeiboStatusItem
from ..common import *
from ..model.mongodb import *

WEIBO_USER_CONFIG = {
    'BASE_URL' : 'https://m.weibo.cn',
    'USER_IDS' : ['6883966016']
}

class WeiboUserSpider(scrapy.Spider):

    name = "weibo_user"

    def start_requests(self):
        for uid in WEIBO_USER_CONFIG['USER_IDS']:
            url = '%s/profile/info?uid=%s' % (WEIBO_USER_CONFIG['BASE_URL'], uid)
            yield Request(url)
            # Define your statuses implementation here, just a demo below
            for i in range(1, 2):
                status_url = '%s/api/container/getIndex?containerid=230413%s_-_WEIBO_SECOND_PROFILE_WEIBO&page_type=03&page=%d' % (WEIBO_USER_CONFIG['BASE_URL'], uid, i)
                yield Request(status_url, callback=self.parse_status)

    # https://m.weibo.cn/profile/1784537661
    def parse(self, response):
        user_data = json.loads(response.text)
        yield WeiboUserItem(
            fans_url = user_data['data']['fans'],
            follow_url = user_data['data']['follow'],
            more_url = user_data['data']['more'],
            user = user_data['data']['user']
        )

    # https://m.weibo.cn/api/container/getIndex?containerid=2304131784537661_-_WEIBO_SECOND_PROFILE_WEIBO&page_type=03&page=2
    def parse_status(self, response):
        status_data = json.loads(response.text)
        yield WeiboStatusItem(
            cards = status_data['data']['cards']
        )

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值