用SCRAPY爬取豆瓣

用SCRAPY爬取豆瓣

Items.py

import scrapy


class Douban1Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    names = scrapy.Field()
    actors = scrapy.Field()
    scores= scrapy.Field()
    webs = scrapy.Field()

douban.py

import scrapy
from douban1.items import Douban1Item

class DoubanSpider(scrapy.Spider):
    header={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36'}
    name = 'douban'
    allowed_domains = ['movie.douban.com']
    start_urls = ['https://movie.douban.com/chart']

    def parse(self, response):
        selector=response.xpath('//div[@class="indent"]//tr[@class="item"]')
        for movie in selector:
            item = Douban1Item()
            item['names'] =movie.xpath("//a[@class='nbg']/@title").extract()
            print('\t')
            item['actors']= movie.xpath("//p[@class='pl']/text()").extract()
            print('\t')
            item['scores'] = movie.xpath("//span[@class='rating_nums']/text()").extract()
            print('\t')
            item['webs'] = movie.xpath("//a[@class='nbg']/@href").extract()
            print('\t')
        return item





pipelines.py
写入txt文件如下:

class Douban1Pipeline(object):
    def process_item(self, item, spider):

        with open('douban.txt', 'a+')  as fp:
            fp.write(str(item['names'])+'\n\n')
            fp.write(str(item['scores'])+'\n\n')
            fp.write(str(item['actors'])+'\n\n')
            fp.write(str(item['webs'])+'\n\n')

pipelines.py
写入json文件如下:

class Douban1Pipeline(object):
     def open_spider(self,spider):
         self.filename=open('douban.json','a')

     def process_item(self, item, spider):
         content=json.dumps(dict(item),ensure_ascii=False)+'\t\t\t\t\t'
         self.filename.write(content)
         return item

      def close_spider(self,spider):
         self.filename.close()

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值