一、cookie基本概念介绍
二、cookie模拟登录
""""
访问一个需要登录的网站需要加入cookie在headers中
以访问知乎热榜https://www.zhihu.com/hot为例
不登录无法访问热榜的内容 :从程序控制台输出的网页内容中ctrl+F查找网页中复制的内容
发现找不到,即使我们在headers中加了User-Agent也访问不了热榜内容
解决方法一: headers中加入从https://www.zhihu.com/hot网页复制的cookie
注意仅仅在 headers中加入从网页https://www.zhihu.com/hot中
复制的referer :https://www.zhihu.com/仍然得不到网页具体内容
必须要加cookie,加了之后referer加不加都可以
解决方法二: 使用http.cookiejar模块
方法一缺点在于每次都要手动复制cookie
"""
方法一:
from urllib import request
url = 'https://www.zhihu.com/hot'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'cookie': '_zap=6254847e-05fc-47da-897d-32799a38c440; _xsrf=LMrVYFcjhUeQXgQwrldY9c5AMfmDkHh9; d_c0="AABe0WtVNRKPTuPNiS6nb1l79WQcZ8BVFKQ=|1605596807"; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1605597577,1605598205,1605616975,1605855550; SESSIONID=ac9jlFTsWbmfmjKTeSn6uMsaphHjGeQwHEWWAswMsoh; JOID=V10cA0j4RCKWAYRkbv1CPsJV2VdwmSsS_2vCPBSpJkP5b-gvNrUon8wHiGFk9L4tUVcrRbcGrsjOvtAgJEnd3gg=; osd=W1gRB0P0QS-SCohhY_lJMsdY3Vx8nCYW9GfHMRCiKkb0a-MjM7gslMAChWVv-LsgVVwnQLoCpcTLs9QrKEzQ2gM=; capsion_ticket="2|1:0|10:1605855613|14:capsion_ticket|44:OWUwMjdjYWJlNDJhNDZlZDlmMWVmODg5YTJiMjYxMmU=|a8eccf9506532bc47f26af3714887c4f5b82829ece6297f19a2e9e5fb66e1a04"; z_c0="2|1:0|10:1605855647|4:z_c0|92:Mi4xZG45MUF3QUFBQUFBQUY3UmExVTFFaVlBQUFCZ0FsVk5ucmVrWUFBSXhxNmdEYUo2NDRKVVRuZDM2bkRIY1RXSUJB|93924560630b1f1de724c39f20715724796813992cefa5bf443a44341eb9b547"; unlock_ticket="ADBA8ak6jAomAAAAYAJVTaZwt18cff3tVnbut8NYHQqBUQNImnP8Tg=="; tst=h; tshl=; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1605856062; q_c1=292df6e7c2104f348df3ce2c4656057e|1605856064000|1605856064000; KLBRSID=cdfcc1d45d024a211bb7144f66bda2cf|1605856297|1605855546',
'referer': 'https://www.zhihu.com/'
}
req = request.Request(url, headers=headers)
with request.urlopen(req) as resp:
print(resp.read().decode('utf-8'))
方法二:见http.cookiejar模块介绍
三、http.cookiejar模块
这一部分会基本的使用就行 不用太深入了解
http.cookiejar模块中主要的类有两个基本类,两个派生类