模拟浏览器进行爬取时遇到的一些问题记录

Yoooung～

已于 2022-05-10 11:47:14 修改

阅读量1.1k

点赞数 1

分类专栏： python 文章标签： python 爬虫

于 2022-05-10 11:32:38 首次发布

本文链接：https://blog.csdn.net/m0_54797890/article/details/124683832

版权

最近实验室要求在爬取一些论文数据，过程中遇到了不少问题，在此记录一下。

未解决的问题

https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cctc.202101625
这个网页，当我用requests去获得它的论文数据时，无论怎么设置headers和cookie，还是显示503错误，不知道是什么反爬的措施。在此把代码贴出来，期待能收获大佬的解答。

import requests
from hyper.contrib import HTTP20Adapter
url = 'https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cctc.202101625'
session = requests.session()
session.mount(url, HTTP20Adapter())
headers = {
   
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"',
        'sec-ch-ua-platform': '"macOS"',
        ':authority': 'chemistry-europe.onlinelibrary.wiley.com',
        ':method': 'GET',
        ':path': '/doi/full/10.1002/cctc.202101625',
        ':scheme': 'https',
        'cache-control': 'max-age=0',
        'sec-ch-ua-mobile': '?0',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate'

最低0.47元/天解锁文章

Yoooung～

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
模拟浏览器进行爬取时遇到的一些问题记录

最近实验室要求在Chemistry Europe这个网站上爬取一些论文数据，过程中遇到了不少问题，在此记录一下。未解决的问题https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cctc.202101625这个网页，当我用requests去获得它的论文数据时，无论怎么设置headers和cookie，还是显示503错误，不知道是什么反爬的措施。在此把代码贴出来，期待能收获大佬的解答。import requestsfrom
复制链接

扫一扫

专栏目录