一、get请求
import requests
if __name__ == '__main__':
response = requests.get(url='http://www.baidu.com/')
response.encoding = 'utf-8' # 设定response 的decode编码
print(response.text)# 获取文本内容
print(response.status_code) # 获取响应状态码
print(response.content) # 获取二进制数据,前面显示一个 b
# 带参数的get请求
# 这里不需要进行手动的params编码,requests已经进行了封装
response = requests.get(url='http://httpbin.org/get',params={'age':28,'salary':20000,'sex':'男'})
print(response.text)
结果:
对于上述二进制数据,对比如下:
response.text:
response.text
<title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title>
response.encoding = 'utf-8'
response.text
<title>百度一下,你就知道</title>
response.content:
response.content
<title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title>
response.content.decode('utf-8')
<title>百度一下,你就知道</title>
带参数的get请求结果如下:
可以从代码中看到,requests相对于urllib简单许多,最起码不用手动编码params,直接以params字典格式写在后面就可以了,requests自动进行编码与URL拼接,很方便。
二、post请求
import requests
url = 'http://httpbin.org/post'
if __name__ == '__main__':
params = {'age':18,'sex':"男",'class':'python'}
response = requests.post(url=url,data=params)
response.encoding = 'utf-8'
print(response.text)
post请求和get请求携带参数的格式一样,结果如下:
可以看到,与get不同的是,post请求是以form表单格式进行提交的,并且url后面并不拼接参数
三、携带cookie,实现模拟登录
import requests
url = 'https://weibo.com/u/7280903412/home?wvr=5'
if __name__ == '__main__':
# 使用cookie模拟登录
cookie = {
"Ugrow-G0":"字段1",
"login_sid_t":"字段2",
"cross_origin_proto":"SSL",
"YF-V5-G0":"e8fcb05084037bcfa915f5897007cb4d",
"WBStorage":"70753a84f86f85ff|undefined",
"wb_view_log":"1920*10801",
"_s_tentry":"passport.weibo.com",
"Apache":"1935966400207.2852.1599121986956",
"SINAGLOBAL":"1935966400207.2852.1599121986956",
"ULV":"1599121986973:1:1:1:1935966400207.2852.1599121986956:",
"SUB":"_2A25yVNoCDeRhGeFM41IY8C3Iyj6IHXVRIEzKrDV8PUNbmtANLXDakW9NQLnaxl9zI-li-BnwywWwuAlJuGoktIOA",
"SUBP":"0033WrSXqPxfM725Ws9jqgMF55529P9D9W5lJ4NGXA62jjXRNJxW1v-n5JpX5KzhUgL.FoME1h54eheXeKz2dJLoIp7LxKML1KBLBKnLxKqL1hnLBoMNeon71K50Sh2E",
"SUHB":"00xZiY5mrq8Rkh",
"ALF":"1630658001",
"SSOLoginState":"字段3",
"wvr":"6",
"YF-Page-G0":"字段4",
"wb_view_log_7280903412":"1920*10801",
"webim_unReadCount":"%7B%22time%22%3A1599122017098%2C%22dm_pub_total%22%3A0%2C%22chat_group_client%22%3A118%2C%22chat_group_notice%22%3A0%2C%22allcountNum%22%3A122%2C%22msgbox%22%3A0%7D"
}
# 这里的cookie同样是F12 Network中的Request Headers 部分中的cookie选项
# 这里加入cookie的方法与urllib不同,urllib是整体加入cookie,即urllib里的cookie就是一个str,这里的cookie是一个dict
response = requests.get(url=url,cookies=cookie)
print('单cookie:',response.text)
# 使用headers模拟登录
headers = {'authority': 'weibo.com',
'method': 'GET',
'path': '/u/7280903412/home?wvr=5',
'scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'max-age=0',
'cookie': "浏览器上的cookie",
'referer': 'https://weibo.com/u/7280903412/home?wvr=5',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36', }
response = requests.get(url=url,headers=headers)
print('headers:',response.text)
上述代码可以实现使用cookie进行模拟登录,模拟header的做法,与urllib可以说是差不多的,但是单独的加上cookie就天差地别了,urllib要求一个str即可,可是requests要求是一个dict,需要手动将str形式的cookie,改成dict形式。修改方法如下所示:
cookie:_EDGE_V=1; PPLState=1;
上面的就需要修改成:
cookie = {
'_EDGE_V':1,
'PPLState':1
}
注意,header没必要全部写全,只需要写需要的部分即可,如只需要cookie,可以只写cookie。其次是cookie是有时效性的,过段时间就会失效。
这个cookie就是按F12,在浏览器的network上找到的cookie信息,发送到特定url需要携带特定的cookie。
四、使用session维持一个会话
目前有些错误,后期补上
五、使用代理IP
import requests
url = 'http://httpbin.org/ip'
if __name__ == '__main__':
# 如果使用的是付费的ip需要用户名和密码,即私密/独享代理
# 格式:proxies={'http':'http://username:password@61.163.32.88:3128'}
response = requests.get(url=url,proxies={'http':'61.163.32.88:3128'},timeout=20)
print(response.text)
这个代理IP的形式和urllib比起来就很简单了。
六、使用模拟user-agent,以及cookie与accept-language的互相影响
import requests
url = 'https://www.alibabacloud.com/'
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'max-age=0',
'cookie':"alicloud_deploy_r_s=sg; cna=gWeXF1SX404CARu7dEdBdzsw; JSESSIONID=2S566MC1-QJZIDYPX49502B0FIMXX3-9S2JTMEK-9HN; tmp0=xVgIN3LPTxSEseclRSibqCLmWlsm%2FVYWENrvpNIEU5cfMCm5mmJjGNmqzVKXG71O98TLVqZm4UWzBP3a3cC%2B%2F6X1wRBq1JeruiSVFdkuT4EkJXjksroNFQzy8LuCAomiXcOQZg%2BjMSpznRs%2FNz9UMw%3D%3D; rmStore=amid:43301; aliyun_choice=intl; aliyun_intl_choice=intl; _ga=GA1.2.231707708.1599138092; _gid=GA1.2.1850712582.1599138092; _bl_uid=OekaeenvmLptXLje85be2hR097yk; stc115239=tsa:1852263375:20200903133153|env:1%7C20201004130130%7C20200903133153%7C3%7C1047918:20210903130153|uid:1599138090063.2086293866.0254636.115239.1405385708:20210903130153|srchist:1047918%3A1%3A20201004130130:20210903130153; login_aliyunid_csrf=_csrf_tk_1691999138113342; _uetsid=33f3e44aba456a01ec901fbe529d9b25; _uetvid=871082c217437dee4c17fd9a8c3cb5d9; isg=BGdnT4Tl9C5vDXCiwC4WPSJ39psx7DvO6d-a9DnUUvYdKIXqQb51HnJqSii2wBNG; aliyun_lang=ja",
'if-none-match': 'W/"f38a-+gB+C+WOEWqdm0sxIeLe2rCCKh0"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3766.400 QQBrowser/10.6.4163.400'}
if __name__ == '__main__':
# headers中accept-language可以调整接收的html的接收语言
# cookie中也会存储接收语言,此例中,accept-language虽然没有ja,但是cookie中带有aliyun_lang字段为ja,故cookie优先级高,返回日语的html
# 如若将cookie删除掉,那么accept-language将起作用,如'accept-language': 'zh-CN,zh;q=0.9',将返回中文(zh),如'accept-language': 'ja',将返回日语(ja)
response = requests.get(url=url,headers=headers)
response.encoding = response.apparent_encoding
print(response.text)