python爬虫必备-requests库详解

requests库详解

提到python爬虫,不得不提起大名鼎鼎的requests库,它的作者是K神( kennethreitz),号称全球最顶尖的Python程序员之一,requests库设计的十分人性化,用着非常舒服,但不支持异步,不知道现在支持不支持,我前些年玩python的时候,requests还不支持异步,当然咱们自己玩,绝大部分场景requests库还是可以满足需求的。

1. get请求
# get
url = 'http://httpbin.org/get'
data = {
	'name': 'dahlin',
	'age': '22'
}
response = requests.get(url, params=data)
# 打印响应类型
print(type(response))
# 打印响应码
print(response.status_code)
# 打印响应文本
print(response.text)
# 打印响应文本类型
print(type(response.text))
# 打印响应cookie
print(response.cookies)
# 打印返回的json串
print(response.json())
2. 请求html范例
# 抓取网页
url = 'https://www.zhihu.com/explore'
headers = {
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',
	'Cookie': 'name=dahlin|mary;age=22|25',
	'Host': 'www.zhihu.com'
}
response = requests.get(url, headers=headers)
# 正则提取标题信息
pattern = re.compile('explore-feed.*?question_link.*?>(.*?)</a>', re.S)
title = re.findall(pattern, response.text)
print(title)
3. 下载二进制数据
# 抓取二进制数据
url = 'https://www.zhihu.com/favicon.ico'
response = requests.get(url)
with open('favicon.ico', 'wb') as f:
	f.write(response.content)
4. post请求
rl = 'http://httpbin.org/post'
data = {
	'name': 'dahlin',
	'age': '22'
}
response = requests.post(url, data)
print(response.text)
print(response.status_code)
# 返回响应头部信息
print(response.headers)
print(response.cookies)
print(response.url)
# 最近的历史列表
print(response.history)
exit() if not response.status_code == requests.codes.ok else print('Request Successfully')
5. 文件上传
files = {
	'file': open('1.jpg', 'rb')
}
response = requests.post('http://httpbin.org/post', files=files)
print(response.text)
6. cookie用法
# cookie
url = 'https://www.lagou.com'
response = requests.get(url)
print(response.cookies)
for key, value in response.cookies.items():
	print(key+'='+value)
7. 会话维持
# 会话维持
session = requests.Session()
url = 'http://httpbin.org/cookies/set/number/123456789'
session.get(url)
response = session.get('http://httpbin.org/cookies')
print(response.text)
8. ssl证书验证
# ssl证书验证
url = 'https://www.12306.cn'
response = requests.get(url, verify=False)
print(response.status_code)
response = requests.get(url, cert=('/path/server.crt', '/path/key'))
9. 代理设置及socket请求

需要安装 pip install 'requests[socks]'

# 代理设置
proxies = {
	"http": "http://10.10.1.10:3128",
	"http": "http://user:password@10.10.1.10:3128/",
	"https": "http://10.10.1.0:1080"
}

"""
pip install 'requests[socks]'
"""
proxies = {
	"http": "socks5://user:password@10.10.1.10:3128",
}
response = requests.get(url, proxies=proxies)
10. 其他设置
# 超时设置
url = 'https://www.taobao.com'
response = requests.get(url, timeout=1)
# 超时分两部分: 连接和读取
response =requests.get(url, timeout=(5, 11))
# 永久等待
response = requests.get(url, timeout=None)
# 身份认证
response = requests.get(url, auth=HTTPBasicAuth('username', 'password'))
response = requests.get(url, auth=('username', 'password'))

# Prepared Request
url = 'http://httpbin.org/post'
data = {
	'name': 'dahlin',
}
headers = {
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',
}
session = Session()
requestobj = Request('POST', url, data=data, headers=headers)
prepped = session.prepare_request(requestobj)
response = session.send(prepped)
print(response.text)

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值