Python爬虫的长文总结，requests与selenium操作合集

最新推荐文章于 2024-05-11 11:58:48 发布

梦魇java

最新推荐文章于 2024-05-11 11:58:48 发布

阅读量1.9k

点赞数 3

分类专栏：数据分析爬虫 python 文章标签： python 爬虫 selenium

本文链接：https://blog.csdn.net/MC_XY/article/details/122199178

版权

Python爬虫大佬的万字长文总结，requests与selenium操作合集

requests模块

前言：

通常我们利用Python写一些WEB程序、webAPI部署在服务端，让客户端request，我们作为服务器端response数据；

但也可以反主为客利用Python的requests模块模拟浏览器行为，向其他站点发送request，让其他站点response数据给我们；

私信小编001即可获取大量Python学习资料！

一、requests模块介绍

requests可以模拟浏览器的请求，比起之前用到的urllib，requests模块的api更加便捷（其本质就是封装了urllib3），

特点：requests库发送请求将网页内容下载下来以后，并不会执行js代码，这需要我们自己分析目标站点然后发起新的request请求

官网链接：http://docs.python-requests.org/en/master/

1、安装requests模块

pip3 install requests

2、requests模块支持的请求方式

常用的就是requests.get()和requests.post()，建议在正式学习requests前，先熟悉下HTTP协议；http://www.cnblogs.com/linhaifeng/p/6266327.html

>>> import requests>>> r = requests.get('https://api.github.com/events')   >>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})>>> r = requests.delete('http://httpbin.org/delete')>>> r = requests.head('http://httpbin.org/get')>>> r = requests.options('http://httpbin.org/get')

二、requests发送GET请求

1、基本get请求

1 import requests2 response=requests.get('http://dig.chouti.com/')3 print(response.text)

response查看response编码

respose.encoding：查看返回网页数据默认编码

import requestsurl='https://www.baidu.com/'respose=requests.get(             url=url,             headers={            'User-Agent':'Mozilla/5.0 (Windows NT 6.1;Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'             })print(respose.encoding)#查看网页编码respose.encoding='utf-8' #设置网页编码print(respose.status_code)with open('a.html','w',encoding='utf-8') as f:    f.write(respose.text)

Python爬虫大佬的万字长文总结，requests与selenium操作合集

2、带参数的GET请求

url编码

#带参数的url,+url编码from urllib.parse import urlencodeimport requestsk=input('输入关键字：  ').strip()res=urlencode({'wd':k},encoding='utf-8')  #url编码respose=requests.get('https://www.baidu.com/s?%s'% res,                     headers={                    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'                     },                     # params={'wd':k}                     )with open('a.html','w',encoding='utf-8') as f:    f.write(respose.text)

headers设置请求头

respose=requests.get('https://www.baidu.com/s?%s'% res,                     headers={                    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'                     },

params 请求参数设置（自动处理URL后参数编码）

k=input('输入关键字：  ').strip()# res=urlencode({'wd':k},encoding='utf-8')  #url编码respose=requests.get('https://www.baidu.com/s?',                     headers={                    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'                     },                     params={'wd':k}                     )with open('a.html','w',encoding='utf-8') as f:    f.write(respose.text)

Cookies 请求携带cookie信息

respose=requests.get('https://www.baidu.com/s?',                     headers={                    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'                     },                     params={'wd':k},                     Cookies={'user_session':'wGMHFJKgDcmRIVvcA14_Wrt_3xaUyJNsBnPbYzEL6L0bHcfc'},                     )

allow_redirects=False 禁止根据resposes的响应头的location做页面跳转，默认是true跳转；

设置为flase可以停留在本次请求（request），获取本次响应（responses）响应头，让跳转的loction地址；否则跳转了获取的就是跳转之后页面的响应内容了！

r3=session.get('https://passport.lagou.com/grantServiceTicket/grant.html',               headers={                   &

最低0.47元/天解锁文章

梦魇java

关注

3
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫的长文总结，requests与selenium操作合集

requests模块前言：通常我们利用Python写一些WEB程序、webAPI部署在服务端，让客户端request，我们作为服务器端response数据；但也可以反主为客利用Python的requests模块模拟浏览器行为，向其他站点发送request，让其他站点response数据给我们；私信小编001即可获取大量Python学习资料！一、requests模块介绍requests可以模拟浏览器的请求，比起之前用到的urllib，requests模块的api更加便捷（其本质就是封装了ur.
复制链接

扫一扫