爬虫-使用代理ip，使用session

最新推荐文章于 2024-09-12 17:53:46 发布

大神，起风了

最新推荐文章于 2024-09-12 17:53:46 发布

阅读量4.1k

点赞数

分类专栏：爬虫文章标签：爬虫使用代理ip

本文链接：https://blog.csdn.net/Light__1024/article/details/88680396

版权

本文介绍了如何在爬虫中使用代理IP进行网页抓取，强调了选择与目标网站相同协议的代理，并给出了字典格式的proxies设置。此外，还讲解了利用Session进行登录并获取登录后页面数据的方法，以豆瓣为例，通过session.post进行登陆，然后使用session.get获取个人页面信息。

摘要由CSDN通过智能技术生成

1、使用代理ip和普通requests.get请求抓取页面流程一样，只是多了个参数proxies.

http://www.goubanjia.com/ 找代理IP，注意http，https，选与目标网址一样的协议。
proxies字典格式的

import requests

url='https://www.baidu.com/s?wd=ip&ie=utf-8'

proxies={
    "https":"218.60.8.99:3129"    
}

headers={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}


response=requests.get(url=url,proxies=proxies,headers=headers)

with open('baiduip.html','w',encoding='utf-8') as f:
    f.write(response.text)

2、使用session抓取需要登陆之后才能看到的页面数据