第一种,Web搜索
- 检查网页数据,获取搜索页面的请求网址。
http://search.people.cn/search-platform/front/search
,POST
请求。- 添加自定义请求头,可以轻松获取请求数据。
- 出现问题:因为网站的反爬检测比较严格,容易IP 封锁。
- 解决方式:
- 添加随机延时请求;
- 使用代理IP;
- 改变网址,使用其他渠道获取。如:小程序,APP等。
使用代理IP
import requests
import random
import time
USER_AGENT = [
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
"User-Agent,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.4094.1 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
"User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
"User-Agent:Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
"User-Agent, Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"
]
query_key = "抗疫"
url = "http://search.people.cn/search-platform/front/search"
params = {
"key": query_key,
"page": 1,
"limit": 10,
"hasTitle": True,
"hasContent": True,
"isFuzzy": True,
"type": 0,
"sortType": 2,
"startTime": 0,
"endTime": 0
}
headers = {
"User-Agent": random.choice(USER_AGENT),
"Referer": f"http://search.people.cn/s?keyword={query_key}&st=0&_={int(time.time()) * 1000}".encode("utf8")
}
ip_type, ip_port = ("http", "http://27.16.166.120:37927/")
proxies = {
ip_type: ip_port
}
res = requests.post(url, json=params, headers=headers, proxies=proxies, timeout=3)
print(res.text)
第二种,APP 数据查询
1. 获取请求信息
- 使用抓包工具获取,请求字段(请求网址、请求方法、请求头信息)
https://api-app.people.cn/api/v2/articles/searchArticle
,GET
请求。
2. 测试
3. 获取详情
- 根据第二步获取的数据网址,解析每一条的对应的内容。
import requests
params = {
"pageSize": 20,
"pageToken": 1,
"udid": "F297902D-6420-49E7-AC53-B1212B505A1A",
"pos": 0,
"deviceOs": "14.8.1",
"userid": "",
"clientVersion": "1.8.1",
"keyWord": "抗疫",
"type": "",
"no_ec": 1,
"pjCode": "rmwapp_2_202011",
"date": "",
"cnt": 20,
"platform": "iOS",
"deviceModel": "iPhone 11",
"revert": 0,
"clientVersionCode": 181,
"highlighter": "1"
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"
}
url = "https://api-app.people.cn/api/v2/articles/searchArticle"
res = requests.get(url, params=params, headers=headers, timeout=3)
print(res.text)