爬取微博

Ajax(Asynchronous JavaScript and XML)
Request Headers里面x-requested-with: XMLHttpRequest标记此请求为Ajax请求

分析:

  1. 浏览器关闭JavaScript
  2. Request Headers里面x-requested-with: XMLHttpRequest标记此请求为Ajax请求
  3. 筛选出XHR并观察响应内容,内容为json格式

在这里插入图片描述

  1. 发现请求方法为get,且type、value和containerid三值固定
    在这里插入图片描述
from urllib.parse import urlencode
import requests
from pyquery import PyQuery

base_url = 'https://m.weibo.cn/api/container/getIndex?'
headers = {
    'Referer': 'https://m.weibo.cn/u/2556696984',
    'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/'
                  '78.0.3904.70 Mobile Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
}

def get_page():

    parms ={
        'type' : 'uid',
        'value' : '2556696984',
        'containerid' : '1076032556696984'
    }
    url  = base_url + urlencode(parms)
    
    try:
        response = requests.get(url=url,headers=headers)
        if response.status_code == 200:
            return response.json()
    except requests.ConnectionError as e:
        print('Error',e.args)

def prase_page(json):
     if json:
        items = json.get('data').get('cards')

        for item in items:
            item = item.get('mblog')
            weibo = {}
            if item:
                weibo['微博链接'] = items[1].get('scheme')
                weibo['文案'] = PyQuery(item.get('text')).text()
                weibo['来源'] = item.get('source')
                weibo['点赞数'] = item.get('attitudes_count')
                weibo['回复数'] = item.get('comments_count')
                weibo['转发数'] = item.get('reposts_count')
                yield weibo
                
def savetomongo(result):
    if collection.insert(result):
        print('SAVE SUCCESS!')

if __name__ == '__main__':
    json = get_page()
    results = prase_page(json)
    for result in results:
        print(result)
    client = MongoClient()
    db = client['weibo']
    collection = db['weibo1']
    savetomongo(result)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值