Python_爬虫_三

最新推荐文章于 2024-08-06 11:55:39 发布

xuptwgl

最新推荐文章于 2024-08-06 11:55:39 发布

阅读量129

点赞数 1

分类专栏：爬虫

本文链接：https://blog.csdn.net/L13259431663/article/details/100943623

版权

爬虫专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Request

使用request库比urllib更加的方便

response.context 和 response.text的区别：
response.context返回bytes，未解码
response.text返回str，由requests 解码，解码结果可能为乱码，因此我们通常需要使用response.context.decode(‘utf-8’)选择合适的编码方式来手动解码

request发送Get请求：

# -*- coding: UTF-8 -*-

import requests

url = 'https://www.baidu.com/s'
# print(type(requests.get(url=url).content)) # bytes
# print(requests.get(url=url).content)

# print(type(requests.get(url=url).text))  # str
# print(requests.get(url=url).text)

params = {
    'wd': '中国'
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
                  '/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}
response = requests.get(url, params=params, headers=headers)
with open('baidu.html', 'w', encoding='utf-8') as f:
    f.write(response.content.decode('utf-8'))

print(response.url)

request发送Post请求：

使用requests发送post请求很简单，直接调用request.post方法就可以了，如果返回时json数据，那么可以使用response.json()的方法，直接将json字符串转换为字典

# -*- coding: UTF-8 -*-

import requests

url = 'https://www.lagou.com/jobs/companyAjax.json?needAddtionalResult=false'
data = {
    'first': 'true',
    'pn': 1,
    'kd': 'python'
}

headers = {
    'Referer': 'https://www.lagou.com/zhaopin/Python/?labelWords=label',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
                  '/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}
response = requests.post(url, data=data, headers=headers)
print(type(response.json()))
print(response.json())

requests使用代理
通过proxies这个关键字参数，既可以简单的使用代理

# -*- coding: UTF-8 -*-

import requests

url = 'http://httpbin.org/ip'

proxy = {
    'http': '1.198.72.156:9999'
}

response = requests.get(url, proxies=proxy)
print(response.text)

requests处理cookie信息
如果想要在多次请求中共享cookie，那么应该使用session。
示例如下：
通过第一次的post进行登陆，后续需要登陆才能访问的页面就可以共享登陆得到的cookie

# -*- coding: UTF-8 -*-
import requests

url = 'xxx.xx.xx'

data = {
    'user': 'xxx',
    'password': 'xxx'
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
                  '/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}

session = requests.session()
session.post(url, data=data, headers=headers)

response = session.get('xxx.xxx.xxx')

requests处理不信任的SSL证书
对于那些不信任的ssl网站，需要加上verify=false这个参数
示例：

resp = requests.get('url', verify=False)

xuptwgl

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录