猿人学第三题解题思路:
通过观察请求接口可以发现 每次点击下一页都会先请求一次jssm接口,然后再请求page接口,
观察接口,发现两个接口直接好像没什么关系,cookie值也是一样的 headers中也没有什么东西,感觉和题目的描述不太一样啊,于是curl,发现请求成功,但是返回了一堆看着看不懂的代码, 不对啊,题目才是简单级别的 ,混淆不应该出现在这里,这一步你判断出返回的代码无用,已经成功了第一步,,如果你想研究这个段代码,觉得它有用,那我只能为你加油~ 切入正题, 自己研究了好长时间, 实在没想出来时哪里出问题了,于是百度了下 哦!原来如此, requests 在请求的时候 会把headers的顺序打乱,导致请求jssm接口不会返回cookie,已至此不能请求成功 page接口,至此,该题也就没啥难度了 需要注意的就两点 headers的顺序不能变, 用请求jssm的cookie去请求page接口。 ok了 代码如下:
import requests
session = requests.session()
def get_page(page):
session.headers = {
"Host": "match.yuanrenxue.cn",
"Connection": "keep-alive",
"Content-Length": "0",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
"Accept": "*/*",
"Origin":"https://match.yuanrenxue.cn",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest":"empty",
"Referer":"https://match.yuanrenxue.cn/match/3",
"Accept-Encoding":"gzip, deflate, br",
"Accept-Language":"zh-CN,zh;q=0.9",
"Cookie":"Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1714982086; Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1714982087; tk=5148424492688281079; sessionid=zdsrlkbnh84a0vndcaqdrcturj3q5dbj; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1714982105; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1714982111"
}
cookies = {
"sessionid": "2821lky8o5ejwtb4pulfhku0mi67jy9q",
}
print(session.post('https://match.yuanrenxue.cn/jssm').cookies.get_dict())
cookies = {
"sessionid": session.post('https://match.yuanrenxue.cn/jssm').cookies.get_dict()["sessionid"]
}
url = "https://match.yuanrenxue.cn/api/match/3"
params = {
"page": str(page)
}
response = session.get(url, cookies=cookies, params=params)
print(response.json())
print(response)
return response.json()
def main():
results = []
for page in range(1, 6):
res = get_page(page)['data']
results += [i["value"] for i in res]
b =max(results,key=results.count)
print(b)
if "__main__" == __name__:
main()