今天发现一个网站,新榜,是各大网站,视频好的排行网站,请求时动态网页的xhr。网络里选择xhr
刷新就可以看到数据。
代码如下:
import requests
headers = {
'authority': 'www.newrank.cn',
'sec-ch-ua': '"Microsoft Edge";v="95", "Chromium";v="95", ";Not A Brand";v="99"',
'accept': 'application/json, text/javascript, */*; q=0.01',
'content-type': 'application/json;charset=UTF-8',
'x-requested-with': 'XMLHttpRequest',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4613.0 Safari/537.36 Edg/95.0.997.1',
'sec-ch-ua-platform': '"Windows"',
'origin': 'https://www.newrank.cn',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,ko;q=0.5,ja;q=0.4',
}
params = (
('nonce', '6fddb2680'),
('xyz', '5bb76228dc539601e197d148ca2d19a9'),
)
data = '{"numeric":"\u5A31\u4E50","rankDate":"2021-09-07","start":1,"size":50,"rankType":"0","type":"0"}'
response = requests.post('https://www.newrank.cn/nr/bili/rank/complexMainRank', headers=headers, params=params, data=data.encode('utf-8'))
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.post('https://www.newrank.cn/nr/bili/rank/complexMainRank?nonce=6fddb2680&xyz=5bb76228dc539601e197d148ca2d19a9', headers=headers, data=data)
print(response.status_code)
print(response.json())
这里面有一个坑就是 data。如果不加encode ,就会返回这样的结果:
这还是编码的问题。这里body其实就是data
加上这个就好了。
结果返回: