python爬虫：代理及相关注意问题

最新推荐文章于 2022-09-16 20:08:07 发布

clover猪猪

最新推荐文章于 2022-09-16 20:08:07 发布

阅读量411

点赞数

分类专栏： python爬虫文章标签： proxy

本文链接：https://blog.csdn.net/weixin_41576911/article/details/79106704

版权

python爬虫专栏收录该内容

11 篇文章 0 订阅

订阅专栏

代理网址：http://www.goubanjia.com/free/index.shtml

import random
import requests
from bs4 import BeautifulSoup
#自动导入包，alt+inter

# http://cn-proxy.com/
proxy_list = (
    'http://117.177.250.151:8081',
    'http://111.85.219.250:3129',
    'http://122.70.183.138:8118',
)
proxy_ip = random.choice(proxy_list) # 随机获取代理ip
proxies = {'http': proxy_ip}
#官网上代理举例，不确定在使用时是否随机
# import requests
# proxies = {
#   "http": "http://10.10.1.10:3128",
#   "https": "http://10.10.1.10:1080",
# }
# requests.get("http://example.org", proxies=proxies)

url = 'https://knewone.com/things/categories/dian-nao/bi-ji-ben-dian-nao'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}

wb_data = requests.get(url,headers=headers,proxies=proxies)
if wb_data.status_code == 404:
    pass
else:
    soup = BeautifulSoup()

分别用列表和字典存储代理IP，在使用时可以使用random.choice随机选择代理；

在解析大量网页时，可以先判断网页的状态码，这里未判断IP是否无效，还是网页无法打开，可能会丢失一些数据

clover猪猪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫：代理及相关注意问题

代理网址：http://www.goubanjia.com/free/index.shtmlimport randomimport requestsfrom bs4 import BeautifulSoup#自动导入包，alt+inter# http://cn-proxy.com/proxy_list = ( 'http://117.177.250.151:8081',
复制链接

扫一扫