人民网数据查询

第一种,Web搜索

  • 检查网页数据,获取搜索页面的请求网址。
  • http://search.people.cn/search-platform/front/searchPOST请求。
  • 添加自定义请求头,可以轻松获取请求数据。
  • 出现问题:因为网站的反爬检测比较严格,容易IP 封锁。
  • 解决方式:
    1. 添加随机延时请求;
    2. 使用代理IP;
    3. 改变网址,使用其他渠道获取。如:小程序,APP等。

使用代理IP

# -*- coding: utf-8 -*-
import requests
import random
import time


USER_AGENT = [
    "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
    "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
    "User-Agent,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.4094.1 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
    "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
    "User-Agent:Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
    "User-Agent, Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"
]

query_key = "抗疫"

url = "http://search.people.cn/search-platform/front/search"
params = {
    "key": query_key,
    "page": 1,
    "limit": 10,
    "hasTitle": True,
    "hasContent": True,
    "isFuzzy": True,
    "type": 0,
    "sortType": 2,
    "startTime": 0,
    "endTime": 0
}
headers = {
    "User-Agent": random.choice(USER_AGENT),
    "Referer": f"http://search.people.cn/s?keyword={query_key}&st=0&_={int(time.time()) * 1000}".encode("utf8")
}
ip_type, ip_port = ("http", "http://27.16.166.120:37927/")
proxies = {
    ip_type: ip_port
}
res = requests.post(url, json=params, headers=headers, proxies=proxies, timeout=3)
print(res.text)

第二种,APP 数据查询

1. 获取请求信息

  • 使用抓包工具获取,请求字段(请求网址、请求方法、请求头信息)
  • https://api-app.people.cn/api/v2/articles/searchArticleGET请求。

2. 测试

  • 使用抓包获取的请求信息,发请求获取数据。

3. 获取详情

  • 根据第二步获取的数据网址,解析每一条的对应的内容。
# -*- coding: utf-8 -*-
import requests

params = {
    "pageSize": 20,
    "pageToken": 1,
    "udid": "F297902D-6420-49E7-AC53-B1212B505A1A",
    "pos": 0,
    "deviceOs": "14.8.1",
    "userid": "",
    "clientVersion": "1.8.1",
    "keyWord": "抗疫",
    "type": "",
    "no_ec": 1,
    "pjCode": "rmwapp_2_202011",
    "date": "",
    "cnt": 20,
    "platform": "iOS",
    "deviceModel": "iPhone 11",
    "revert": 0,
    "clientVersionCode": 181,
    "highlighter": "1"
}
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"
}
url = "https://api-app.people.cn/api/v2/articles/searchArticle"

res = requests.get(url, params=params, headers=headers, timeout=3)
print(res.text)
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值