request爬虫通用框架

明天不早检

已于 2023-02-28 20:20:52 修改

阅读量708

点赞数

文章标签：爬虫 python

于 2023-02-26 20:56:42 首次发布

本文链接：https://blog.csdn.net/ydz386909516/article/details/129231383

版权

本文介绍了使用requests库创建爬虫通用框架的方法，包括通过get_url.py模块导入并调用request_get_text()获取网页文本字符串及request_get_content()获取字节码。

摘要由CSDN通过智能技术生成

requests.get() 爬取网页通用框架

使用方法：

1.复制代码，保存为get_url.py

2.在新py文件中 import get_url

3. r = get_url.request_get_text('https://******') # 返回字符串

r = get_url.request_get_content('https://******') # 返回字节码

import requests
import random


def header():
    headers_list = [
        {
            'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'
        }, {
            'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36'
        }, {
            'user-agent': 'Mozilla/5.0 (Linux; Android 10; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Mobile Safari/537.36'
        }, {
            'user-agent': 'Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1'
        }, {
            'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'
        }, {
            'user-agent': 'Mozilla/5.0 (Linux; Android) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.109 Safari/537.36 CrKey/1.54.248666'
        }, {
            'user-agent': 'Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.188 Safari/537.36 CrKey/1.54