Python-反爬篇

最新推荐文章于 2023-11-29 01:35:07 发布

MaoziShan

最新推荐文章于 2023-11-29 01:35:07 发布

阅读量324

点赞数 1

分类专栏： Python 爬虫文章标签： python

本文链接：https://blog.csdn.net/MaoziYa/article/details/106658607

版权

Python 爬虫专栏收录该内容

4 篇文章 1 订阅

订阅专栏

使用fake_useragent随机构建UserAgent

from fake_useragent import UserAgent

ua = UserAgent(verify_ssl=False)
def get_header():
    return {
        'User-Agent': ua.random
    }

使用代理池

import requests

#  首先需要配置代理池，具体见：https://github.com/Python3WebSpider/PorxyPool

def get_proxy():
    proxypool_url = 'http://127.0.0.1:5555/random'
    proxies = {'http': 'http://' + requests.get(proxypool_url).text.strip()}
    return proxies

使用time.sleep()

import time

# 以上方法加上适当的sleep，基本不会出错

time.sleep(0.1)  # 以s为单位

实践

import requests

url = 'https://m.weibo.cn/'
time.sleep(0.1)
resp = requests.get(url,headers=get_header(),proxies=get_proxy())

问题

批量爬取微博信息的时候还是会出现418错误，目前的措施只能相对减少418的产生。

优惠劵

MaoziShan

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python-反爬篇

使用fake_useragent随机构建UserAgentfrom fake_useragent import UserAgentua = UserAgent(verify_ssl=False)def get_header(): return { 'User-Agent': ua.random } 使用代理池import requests# 首先需要配置代理池，具体见：https://github.com/Python3WebSpider/Porx
复制链接

扫一扫