当使用urllib
库而需要添加请求头,代理,cookie信息时。
1.一般的添加请求头情况。
示例代码如下:
from urllib import request
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 '
'Safari/537.36',
'Cookie':'BAIDUID=F65C2D9CDA40BD20BFD1304D381773EE:FG=1; BIDUPSID=F65C2D9CDA40BD20BFD1304D381773EE; PSTM=1514819000; BDUSS=gxZ05wNWM4Yn5DdUdRaWtWWUM0cmxZMH4zT1BTU1hDREgyMjZTSWxJY2tWR2xiQVFBQUFBJCQAAAAAAAAAAAEAAABQUkROYmFsYWJhbGHQocaouqIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACTHQVskx0FbND; pgv_pvi=9285178368; pgv_si=s4747706368; BD_HOME=1; ZD_ENTRY=bing; cflag=13%3A3; BD_UPN=12314753; delPer=0; BD_CK_SAM=1; PSINO=7; H_PS_PSSID=1443_21115_20697_29063_28518_29099_28836_28584_26350_20718; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDRCVFR[feWj1Vr5u3D]=mk3SLVN4HKm; COOKIE_SESSION=10_0_0_1_0_1_0_0_0_1_0_0_0_0_0_0_0_0_1558950188%7C2%230_0_1558950188%7C1'
}
url = 'https://www.baidu.com/'
req = request.Request(url,headers=headers)
resp = request.urlopen(req)
print(resp.read())
2.简便的处理cookie的方法
- 在Python处理
Cookie
,一般是通过http.cookiejar
模块和urllib模块的HTTPCookieProcessor
处理器类一起使用。http.cookiejar
模块主要作用是提供用于存储cookie
的对象。而HTTPCookieProcessor
处理器主要作用是处理这些cookie
对象,并构建handler
对象。 http.cookiejar
模块主要的类有CookieJar
、FileCookieJar
、MozillaCookieJar
、LWPCookieJar
。
示例代码如下:
from urllib import request
from http.cookiejar import CookieJar
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
url = 'https://www.baidu.com/'
cookiejar = CookieJar()
handler = request.HTTPCookieProcessor(cookiejar)
opener = request.build_opener(handler)
req = request.Request(url, headers=headers)
resp = opener.open(req)
print(resp.getcode())
3.使用ProxyHandler处理器设置代理
示例代码如下:
from urllib import request
handler = request.ProxyHandler({"http":"45.125.32.181:3128"})
opener = request.build_opener(handler)
req = request.Request("http://httpbin.org/ip")
resp = opener.open(req)
print(resp.read())