爬虫库的基本使用（3）

最新推荐文章于 2023-08-21 20:54:13 发布

星空路途

最新推荐文章于 2023-08-21 20:54:13 发布

阅读量143

点赞数 1

分类专栏： Python爬虫文章标签： python cookie post

本文链接：https://blog.csdn.net/weixin_43466246/article/details/113312986

版权

Python爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

request库 --第三方库

Requests是用python语言编写的，基于urllib，但是它比urllib更加方便，可以节约我们大量的工作，完全满足HTTP测试需求。

#安装requests
pip install requests

基本使用

发送get请求：

resp=requests.get('http://www.baidu.com')

import requests

#添加headers和查询参数
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
kw={'wd':'中国'}

#params 接收一个字典或字符串的查询参数，字典类型自动转换为url编码，不需要使用urlencode()
resp=requests.get('https://www.baidu.com/s',headers=headers,params=kw)

print(resp)

#查询响应内容
# print(resp.text)    #返回unicode格式数据
# print(resp.content.decode('utf-8')) #返回字节流数据 可以手动编码

print(resp.url)
print(resp.encoding)

发送post请求：

#主要是添加了data参数（字典类型）
resp=requests。post('http://www.baidu.com',data=data)

#请完善data中的数据
import requests
url='https://i.meishi.cc/login.php?redirect=https%3A%2F%2Fwww.meishij.net%2F'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

data={'redirect': 'https://www.meishij.net/',
    'username': '',
    'password': ''}

resp=requests.post(url,headers=headers,data=data)
print(resp.text)

使用代理

#主要添加proxies参数
import requests

proxy={'http':'123.169.118.136:9999'}

url='http://www.httpbin.org/ip'

resp=requests.get(url,proxies=proxy)
print(resp.text)

使用cookie值登录

session: 使用requests，也要达到共享cookie的目的

#获取cookie信息
import requests
resp=requests.get('https://www.baidu.com')
print(resp.cookies)
print(resp.cookies.get_dict())
#共享cookie
post_url='https://i.meishi.cc/login.php?redirect=https%3A%2F%2Fwww.meishij.net%2F'
post_data={'username':'1097566154@qq.com',
    'password':'wq15290884759.'}
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

#登录
session=requests.session()
session.post(post_url,headers=headers,data=post_data)

#访问个人网页
url='https://i.meishi.cc/cook.php?id=13686422'
resp=session.get(url)
# print(resp.text)

处理不信任的SSL证书

#在get中添加verify=False
import requests
url='https://inv-veri.chinatax.gov.cn/'
resp=requests.get(url,verify=False)

print(resp.content.decode('utf-8'))

星空路途

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录