python获取免费代理

最新推荐文章于 2023-09-14 17:08:16 发布

阳光彩虹小白狗

最新推荐文章于 2023-09-14 17:08:16 发布

阅读量319

点赞数

分类专栏：小程序

本文链接：https://blog.csdn.net/y471992509/article/details/79970218

版权

小程序专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、需求

在使用爬虫抓取页面时有时会封ip，这时需要使用代理，所以需要使用代理来规避封ip带来的麻烦，需求量不大的话使用免费的代理即可，这个小程序就是来干这个的。

二、程序说明

程序结构？、
不存在的！整个小程序都在一个文件内存储
思路
- 获取几个免费代理网站的所有代理
  根据几个网站的不同需要使用不同的数据获取方式，比如快代理：
```
def proxy_kuai(self):
"""抓取快代理中的免费代理"""
base_url = 'https://www.kuaidaili.com/free/inha/{}/'
for i in range(1, 5):
  time.sleep(1)
  res = requests.get(base_url.format(i), headers=self.headers)
  soup = BeautifulSoup(res.text, 'lxml')
  trs = soup.find_all('tr')
  for tr in trs[1:]:
      tds = tr.find_all('td')
      self.queue.put({str(tds[3].text).lower(): str(tds[0].text) +
                      ':' + str(tds[1].text)})
```
- 上述代码中最后将结果放入队列中，因为之后要用多线程来进行验证，所以使用了python自带的queue队列
- 最后该验证获取的代理哪些可用，我这边是向http://checkip.amazonaws.com来验证，通过requests.get(url, proxies=proxies, timeout=2.01)来验证，如果是4xx或者5xx的响应状态，get函数会报错，所以使用try-except来捕获异常，完成自己的目标。
- 验证通过的代理可以存储为自己想要的格式，这里使用的是txt
部分代码演示

def run(self):
      """
      进行验证并且存储到给定的文件中去
      """
      start_ips = time.time()
      print('------开始获取所有免费代理地址-------')
      self.get_ips()
      end_ips = time.time()
      print('------获取结束, 总耗时：{:.2f}s-------'.format(end_ips - start_ips))
      print('------开始验证可用代理-------')
      start_verf = time.time()
      threads = []
      while not self.queue.empty():
          for thread in threads:
              if not thread.is_alive():
                  # 移除停止活动的线程
                  threads.remove(thread)
          while len(threads) < self.max_threads:
              porxy = self.queue.get()
              thread = threading.Thread(target=self.go_verify, args=(porxy,))
              thread.setDaemon(True)
              thread.start()
              threads.append(thread)
      end_verf = time.time()
      print('------验证结束，总耗时：{:.2f}s-------'.format(end_verf - start_verf))
      with open('http.txt', 'w') as f:
          for proxy in self.http:
              f.write(proxy['http'] + '\n')
      with open('https.txt', 'w') as f:
          for proxy in self.https:
              f.write(proxy['https'] + '\n')
      print('------存储完毕，存储地址为：{} 以及 {}------'
            .format('http.txt', 'https.txt'))

效果图展示
运行状态

运行结果

阳光彩虹小白狗

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python获取免费代理

一、需求在使用爬虫抓取页面时有时会封ip，这时需要使用代理，所以需要使用代理来规避封ip带来的麻烦，需求量不大的话使用免费的代理即可，这个小程序就是来干这个的。二、程序说明程序结构？、不存在的！整个小程序都在一个文件内存储思路获取几个免费代理网站的所有代理根据几个网站的不同需要使用不同的数据获取方式，比如快代理：def proxy_kuai(self):"""抓取快代理中的免费代理"""ba...
复制链接

扫一扫