scrapy 爬取糗事百科段子篇章二（下载用户头像）

最新推荐文章于 2021-07-30 22:13:29 发布

还是那片西瓜吗

最新推荐文章于 2021-07-30 22:13:29 发布

阅读量151

点赞数

分类专栏： scrapy爬虫框架

本文链接：https://blog.csdn.net/qq_37377136/article/details/107239874

版权

scrapy爬虫框架专栏收录该内容

4 篇文章 0 订阅

订阅专栏

接着博客往下走上篇博客地址

一、更新代码

vim ITtest.py

import scrapy
from qiushi.items import QiushiItem   #导入糗事项目下items中QiushiItem函数
from scrapy.http.response.html import HtmlResponse   #导入HtmlXPathSelector模块
from scrapy.selector.unified   import SelectorList   #导入SelectorList模块
import urllib
import os


class IttestSpider(scrapy.Spider):
    name = 'ITtest'
    allowed_domains = ['www.qiushibaike.com']
    start_urls = ['https://www.qiushibaike.com/text/page/1/']
    bash_domain = "https://www.qiushibaike.com"

    def parse(self, response):
        body = response.xpath('//div[@class="col1 old-style-col1"]/div')
        for duanzhi in body:
            touxiang = duanzhi.xpath('.//div//@src').get()
            neirong = duanzhi.xpath('.//div[@class="content"]//text()').getall()
            neirong = "".join(neirong).strip()
            zuozhe  = duanzhi.xpath('.//div//h2/text()').get().strip()
            item = QiushiItem(头像=touxiang,作者=zuozhe,内容=neirong)
            #判断文件夹是否存在，无则创建
            path_dir = os.path.dirname(os.getcwd()) + '/img/'
            if not os.path.exists(path_dir):
                os.mkdir(path_dir)

            if  zuozhe and touxiang:
                print(zuozhe,touxiang)
                file_path = os.path.join(path_dir, zuozhe + '.jpg')
                if not os.path.exists(file_path):
                   #os.mknod创建空文件
                   os.mknod(file_path)
                   print(file_path)
               # #urllib.urlretrieve 直接将远程数据下载到本地
                   urllib.request.urlretrieve('http:'+touxiang, file_path)
            yield item
        next_url = response.xpath("//ul[@class='pagination']/li[last()]/a/@href").get()
        if not next_url:
            return
        else:
            yield  scrapy.Request(self.bash_domain+next_url,callback=self.parse)

二、再次爬虫

scrapy  crawl ITtest

在这里插入图片描述

三、查看爬取数据
在这里插入图片描述

四、打包压缩传输到windows机器中

zip -r img.zip img/

在这里插入图片描述

查看img文件
在这里插入图片描述

还是那片西瓜吗

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy 爬取糗事百科段子篇章二（下载用户头像）

接着博客往下走上篇博客地址一、更新代码vim ITtest.pyimport scrapyfrom qiushi.items import QiushiItem #导入糗事项目下items中QiushiItem函数from scrapy.http.response.html import HtmlResponse #导入HtmlXPathSelector模块from scrapy.selector.unified import SelectorList #导入Selecto
复制链接

扫一扫

专栏目录