Scrapy爬虫抓取ZOL手机详情

最新推荐文章于 2020-12-05 06:49:43 发布

程序员lamed

最新推荐文章于 2020-12-05 06:49:43 发布

阅读量334

点赞数 1

文章标签： Python 编程新手入门程序员

本文链接：https://blog.csdn.net/weixin_45342712/article/details/95331010

版权

本文介绍了如何使用Scrapy爬虫框架抓取中关村在线(ZOL)上的手机详情数据，包括从手机商城列表页到单个手机详情页再到更详细信息页面的抓取流程。作者分享了爬虫代码，并期望得到改进意见。

摘要由CSDN通过智能技术生成

前不久需要一批手机数据做测试，所以就爬取了ZOL上关于手机的各项参数，现在把代码分享出来，希望大家能够多提改进意见。

ZOL手机信息
想要抓取ZOL关于手机的信息需要三个步骤：

手机商城列表页 —》单个手机详情页 ----》当前手机更多详情页面

爬虫代码
在这里插入图片描述

# -*- coding: gbk -*-
from scrapy.spiders import CrawlSpider
import scrapy
from urllib.parse import urljoin


class PhoneSpider(CrawlSpider):
    name = "phone"
    allowed_domains = ["detail.zol.com.cn"]

    def start_requests(self):
        for i in range(30):
            yield scrapy.Request('http://detail.zol.com.cn/cell_phone_index/subcate57_list_' + str(i + 1) + '.html',
            self.parse, 
            dont_filter=True)

    def parse(self, response): # 手机商城列表页
        phone_plane = response.css('div.pic-mode-box')
        phone_