代理网址:http://www.goubanjia.com/free/index.shtml
import random
import requests
from bs4 import BeautifulSoup
#自动导入包,alt+inter
# http://cn-proxy.com/
proxy_list = (
'http://117.177.250.151:8081',
'http://111.85.219.250:3129',
'http://122.70.183.138:8118',
)
proxy_ip = random.choice(proxy_list) # 随机获取代理ip
proxies = {'http': proxy_ip}
#官网上代理举例,不确定在使用时是否随机
# import requests
# proxies = {
# "http": "http://10.10.1.10:3128",
# "https": "http://10.10.1.10:1080",
# }
# requests.get("http://example.org", proxies=proxies)
url = 'https://knewone.com/things/categories/dian-nao/bi-ji-ben-dian-nao'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
wb_data = requests.get(url,headers=headers,proxies=proxies)
if wb_data.status_code == 404:
pass
else:
soup = BeautifulSoup()
分别用列表和字典存储代理IP,在使用时可以使用random.choice随机选择代理;
在解析大量网页时,可以先判断网页的状态码,这里未判断IP是否无效,还是网页无法打开,可能会丢失一些数据