爬虫（十） requests库

最新推荐文章于 2022-04-13 09:00:59 发布

Lin_junhan

最新推荐文章于 2022-04-13 09:00:59 发布

阅读量313

点赞数

分类专栏： python 爬虫

本文链接：https://blog.csdn.net/Lin_junhan/article/details/87992939

版权

python 同时被 2 个专栏收录

31 篇文章 0 订阅

订阅专栏

爬虫

13 篇文章 0 订阅

订阅专栏

requests库和urllib库一样可以用来获取网页内容，但requests使用起来比起urllib库要方便许多，requests不需要想urllib那样先获取请求，再通过请求获取响应，只需要直接通过requests使用get/post等直接获取响应，并且requests库可以通过创建会话轻松实现cookie、代理等高级功能。

安装requests

pip install requests

官方使用文档：

http://www.python-request.org/

获取响应：

requests.get(url, headers=headers, params=data)

requests.post()....

响应对象（假设r为响应对象）：

r.text  字符串形式查看响应内容
r.content   字节类型查看响应内容
r.encoding   查看或者设置编码类型
r.status_code    查看状态码
r.headers      查看响应头部
r.url     查看所请求的url
r.json        查看json格式数据

get请求代码示例：

import requests

url = 'http://www.baidu.com/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko)'
                  ' Chrome/63.0.3239.132 Safari/537.36'
}
response = requests.get(url, headers=headers)
# print(response)
response.encoding = 'utf-8'
with open(r'C:\Users\m1552\PycharmProjects\newWork\pa_chong\requests.txt', 'w', encoding = 'utf-8') as fp:
	fp.write(response.text)

带参数的get请求代码示例：

url = 'https://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD'
data = {
	'wd':'%E4%B8%AD%E5%9B%BD',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko)'
                  ' Chrome/63.0.3239.132 Safari/537.36'
}

# parameters参数
response = requests.get(url, headers=headers, params=data)
response.encoding = 'utf-8'
print(response.content)

# # 把结果写入到文件
# with open(r'C:\Users\m1552\PycharmProjects\newWork\pa_chong\baidu.html', 'wb') as fp:
# 	fp.write(response.content)

post请求代码示例：

import json
url = 'https://cn.bing.com/ttranslationlookup?&IG=DF5310683E534917A8E04ECAF4BE95D7&IID=translator.5038.4'
data = {
	'from':'en',
	'to':'zh-CHS',
	'text':'computer',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko)'
                  ' Chrome/63.0.3239.132 Safari/537.36'
}

response = requests.post(url=url, headers=headers, data=data)

with open(r'C:\Users\m1552\PycharmProjects\newWork\pa_chong\computer.txt', 'w', encoding='utf8') as fp:
	fp.write(json.dumps(response.json()))

cookie的使用：

先创建一个会话Session，并且后面都用这个会话获取响应。

下面是代码示例：

import requests

# 如果碰到会话相关的问题，要首先创建一个会话。
# 往下所有的操作都通过s进行访问 s.get() s.post()
s = requests.Session()

post_url = 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=20191009575'

fromdata = {
	'email':'15521093428',
	'icode':'',
	'origURL':'http://www.renren.com/home',
	'domain':'renren.com',
	'key_id':'1',
	'captcha_type':'web_login',
	'password':'3c493a9b58fc672420b168275078df37bd0b51e3c424cec1e5928d37d3005a4d',
	'rkey':'2de811fa2fc149ce4704d89fcdb84777',
	'f':'https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D1pi5EUYGZKK9a8cdrLgTUtcEDhxgrPP8IhzTf_vmIsy%26wd%3D%26eqid%3Dd710580c0002a94a000000035c5efa51',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko)'
                  ' Chrome/63.0.3239.132 Safari/537.36'
}

response = s.post(url=post_url, headers=headers, data = fromdata)

# print(response.text)

get_url = 'http://www.renren.com/969499615/profile'
response2 = s.get(url = get_url, headers=headers)
print(response2.text)

代理使用：

r = requests.post(url=url, headers=headers, proxies=proxies)
proxies = {
   'http': 'http://代理号'
}

Lin_junhan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫（十） requests库

requests库和urllib库一样可以用来获取网页内容，但requests使用起来比起urllib库要方便许多，requests不需要想urllib那样先获取请求，再通过请求获取响应，只需要直接通过requests使用get/post等直接获取响应，并且requests库可以通过创建会话轻松实现cookie、代理等高级功能。安装requestspip install requests...
复制链接

扫一扫

专栏目录