爬虫--09：反爬机制

最新推荐文章于 2024-04-29 17:55:25 发布

置顶

VIP文章十束多多良^_^

最新推荐文章于 2024-04-29 17:55:25 发布

阅读量812

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/Rhymeplot__JDQS/article/details/117608226

版权

Crawler--09: Anti-Crawling-Mechanismus

一、ua反爬
二、IP反爬
- 1、相关网址
- 2、ip反爬
三、图形验证码反扒

一、ua反爬

爬虫中非常重要的一种反反爬策略
user-agent 用户代理
Fake_useragent模块
安装Fake_useragent模块

pip install fake_useragent

随机ua的使用

ua = UserAgent()
s = ua.random
print(s)

二、IP反爬

1、相关网址

返回当前的ip地址
- http://httpbin.org/ip
查看当前的ip地址
- https://www.ipip.net/
比较好用的ip代理平台
- https://h.wandouip.com/member/index

2、ip反爬

有些网站会检测ip在同一时间内的访问次数，如果过于频繁，会封禁当前ip
解决办法：设置代理ip
- 在requests模块中，有一个参数proxies来设置代理ip
- 相关代理ip网站上的免费ip不好使
- 付费的代理ip不会用
- 通过cmd（windows）命令行内输入ipconfig来查看电脑的内部的ip地址
- 查看上网IP：使用IPIP网站查看电脑的上网IP
推荐代理网站
- 豌豆代理：https://www.wandouip.com/?wd=2020094010&e_matchtype=bdpc&bd_vid=9725195936188658607
  - 注册
  - 点击ip白名单，添加外网ip
  - 选择提取api，生成api连接，复制api

import random

import requests


# 设置代理
# proxy = {
   
#     'http':'114.98.162.196:9999'
# }
# url = 'http://www.httpbin.org/ip'
# res = requests.get(url, proxies=proxy)
# print(res.text)

ips = [
('223.240.244.48:23564'),
('121.233.226.191:5412'),
('114.100.3.87:766'),
('58.219.59.76:5412'),
('180.113.10.47:894'),
('27.40.111.110:36410'),
('42.56.3.242:766'),
('180.113.12.163:5412'),
('113.237.243.46:3617'),
('58.219.59.129:36410'),
('223.240.242.44:5412'),
('117.60.239.133:5412'),
('114.97.199.48:3617'),
('163.179.204.157:3617'),
('180.125.97.143:894'),
('60.174.190.152:23564'),
('49.86.177.230:36410'),
('182.101.237.158:5412'),
('114.98.139.136:23564'),
('114.225.241.237:23564'),
]
url = 'http://www.httpbin.org/ip'
for i in range(20):
    try:
        ip = random.choice(ips)
        res = requests.get(url, proxies={
   'http':ip}, timeout=0.5)
        print(res.text)
    except Exception as e:
        print('出现错误！',e)

快代理
- https://www.kuaidaili.com
- 语法格式：proxies = {'协议':'协议://用户名:密码@ip:端口号'}

import requests
url = 'http://www.httpbin.org/ip'
proxies = {
   
    'http': 'http://1550023517:[email protected]:16817',
    'https': 'http://1550023517:[email protected]:16817'
}

result = requests.get(url, proxies=proxies)
print(result.text)

开放代理

import requests


class ProxyPool():

    def __init__(self):
        self.proxy_url = 'http://dev.kdlapi.com/api/getproxy/?orderid=992045485987175&num=100&protocol=2&method=1&an_ha=1&sep=2'
        self.test_url = 'https://www.baidu.com/'
        self.headers = {
   
            'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
        }

    def get_prosy_pool(self):
        html = requests.get(url=self.proxy_url, headers=self.headers).text
        proxy_list = html.split('\n')
        # print(proxy_list)
        for prosy in proxy_list:
            self.test_proxy(prosy)

    def test_proxy(self, prosy):
        proxies = {
   
            'http':'{}'

最低0.47元/天解锁文章

十束多多良^_^

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
爬虫--09：反爬机制

Crawler--09: Anti-Crawling-Mechanismus一、ua反爬二、IP反爬1、相关网址2、ip反爬三级目录一、ua反爬爬虫中非常重要的一种反反爬策略user-agent 用户代理Fake_useragent模块安装Fake_useragent模块pip install fake_useragent随机ua的使用ua = UserAgent()s = ua.randomprint(s)二、IP反爬1、相关网址返回当前的ip地址http://
复制链接

扫一扫