爬取西瓜视频影视分类

最新推荐文章于 2025-02-21 01:41:47 发布

《落神》

最新推荐文章于 2025-02-21 01:41:47 发布

阅读量1.8w

点赞数

分类专栏：爬虫自动化工具文章标签： scrapy python

本文链接：https://blog.csdn.net/zuo199606184810/article/details/88535325

版权

博主分享了使用Python3、Scrapy框架结合Selenium爬取西瓜视频的经验，强调每个网站的反爬策略不同，爬虫规则需要不断调整。文中提到在不使用驱动的情况下无法获取数据，故在pipelines.py、items.py和middlewares.py中进行了相应设置。虽然代码注释较少，但已实现成功爬取，并提醒实际操作时要设置代理IP以防IP被封。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

许久没有更新过博客了，今天帮朋友爬了西瓜视频，希望对喜欢python爬虫的朋友有所指引，希望大神批评指正。

每一个网站都或多或少有一点反爬虫机制，并持续添加新的爬虫机制，所以说针对每一个网站，爬虫规则并不是一成不变的。

我也会持续更新中！！！！

本次爬虫采用：python3+scrapy+selenum,闲话少说，贴代码。

核心逻辑代码xigua.py：

# -*- coding: utf-8 -*-
import scrapy
import json
import xlwt
import datetime

from ..items import XiguaspiderItem

# https://www.ixigua.com/api/pc/feed/?min_behot_time=0&category=subv_xg_movie&utm_source=toutiao&widen=1&tadrequire=true&as=A1356CB8354CB7B&cp=5C85ECBB27BB0E1&_signature=ojnZNhAa.ssIFpm2yASWDqI52S
# https://www.ixigua.com/api/pc/feed/?max_behot_time=1552274468&category=subv_xg_movie&utm_source=toutiao&widen=1&tadrequire=true&as=A1157C68A5DD8CE&cp=5C85EDA82C2E5E1&_signature=ojnZNhAa.ssIFpm2yAQSOKI52S

"""
西瓜视频：xigua
"""

class XiguaSpider(scrapy.Spider):
    name = 'xigua'
    allowed_domains = ['ixigua.com']
    start_urls = ['https://www.ixigua.com/api/pc/feed/?min_behot_time=0&category=subv_xg_movie&utm_source=toutiao&widen=1&tadrequire=true&as=A1153CD8459DA0F&cp=5C85ED8AE0BF1E1&_signature=ojnZNhAa.ssIFpm2yASWDqI52S']
    doc_url = 'https://www.ixigua.com/api/pc/feed/?max_behot_time={}&category=subv_xg_movie&utm_source=toutiao&widen=1&tadrequire=true&as=A185AC288847AA0&cp=5C88172A3A40AE1&_signature=YaQQuxAbPTDLi1A75tbnUmGkEK'
    base_url = 'https://www.ixigua.com'

    custom_settings &#

最低0.47元/天解锁文章