爬取猫眼电影(推送item)( 减少IO文件流)

import scrapy

class MaoyanSpider(scrapy.Spider):
    name = 'maoyan'
    allowed_domains = ['maoyan.com']
    start_urls = ['https://maoyan.com/films?showType=3']

    def parse(self, response):
        names=response.xpath("//div[@class='channel-detail movie-item-title']/a/text()").extract()
        #这里我们不用extract()因为还要继续xpath
        scores_div=response.xpath("//div[@class='channel-detail channel-detail-orange']")
        scores=[]
        for score in scores_div:
            #当前节点
            #print(score)
            scores.append(score.xpath('string(.)').extract_first())
        for name,score in zip(names,scores):
            #print(name,":",score)
            #yield只能推送字典或这item对象
            yield {"name":name,"score":score}

yield只能推送字典或这item对象,现在我们推送的是字典,那我们怎么推送item呢

在这里插入代码片

首先去items.py[源文件]

import scrapy
class MyspiderItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    pass

我们改成我们要的items

import scrapy
class MovieItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    score = scrapy.Field()

我们在我们的yield中调用这个MovieItem
看清这个 yield item也是缩进的

import scrapy
from MySpider.items import MovieItem

class MaoyanSpider(scrapy.Spider):
    name = 'maoyan'
    allowed_domains = ['maoyan.com']
    start_urls = ['https://maoyan.com/films?showType=3']

    def parse(self, response):
        names=response.xpath("//div[@class='channel-detail movie-item-title']/a/text()").extract()
        #这里我们不用extract()因为还要继续xpath
        scores_div=response.xpath("//div[@class='channel-detail channel-detail-orange']")
        scores=[]
        for score in scores_div:
            #当前节点
            #print(score)
            scores.append(score.xpath('string(.)').extract_first())
        item=MovieItem()
        for name,score in zip(names,scores):
            item['name']=name
            item['score']=score
            yield item

我们推送后要怎么写入文件呢?
进入pipeline文件中:

import json

class MyspiderPipeline(object):
    def process_item(self, item, spider):
        with open('movie.txt',"a",encoding='utf-8')as f:
            f.write(json.dumps(item,ensure_ascii=False))
        return item

这个文件会被打开多次,文件流会太频繁不好

于是scrapy给我们提供了他的方案:

import json

class MyspiderPipeline(object):

    #在爬虫开启时开启
    def open_spider(self,spider):
        self.filename=open('movie.txt',"a",encoding='utf-8')as f:

    def process_item(self, item, spider):
        self.filename.write(json.dumps(item,ensure_ascii=False))
        return item

    def close_spider(self,spider):
        self.filename.close
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ATOM_123

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值