Python网络爬虫--项目实战--用scrapy框架爬取王者荣耀英雄信息

最新推荐文章于 2024-04-19 19:46:20 发布

小昀小杭

最新推荐文章于 2024-04-19 19:46:20 发布

阅读量1.1k

点赞数

文章标签： python

本文链接：https://blog.csdn.net/weixin_50267049/article/details/109649389

版权

本文介绍如何使用Python的Scrapy框架进行网络爬虫实战，具体项目为爬取王者荣耀游戏中的英雄信息。通过wzry.py文件设置爬虫逻辑，以及pipelines.py处理爬取的数据。

摘要由CSDN通过智能技术生成

wzry.py

import scrapy

from LearnScrapy.items import HeroItem


class WzrySpider(scrapy.Spider):
    name = 'wzry'
    allowed_domains = ['pvp.qq.com']
    start_urls = ['https://pvp.qq.com/web201605/herolist.shtml']

    hero_detail_base_url = "https://pvp.qq.com/web201605/"

    def parse(self, response):
        # print(response)
        hero_list = response.xpath("//div[contains(@class, 'herolist-content')]/ul[contains(@class, 'herolist')]/li/a/@href").extract()
        # print(hero_list)
        # for hero_detail in hero_list:
            # yield scrapy.Request(url=self.hero_detail_base_url + hero_detail, callback=self.parse_hero_detail, meta={"msg": "ok"})
            # yield scrapy.Request(url=response.urljoin(hero_detail), callback=self.parse_hero_detail, meta={"msg": "ok"})
            # yield response.follow(url=hero_detail, callback=self.parse_hero_detail, meta={"msg": "ok"})

        requests = response.follow_all(urls=hero_list, callback=self.parse_hero_detail, meta={
   "msg": "ok"}