scrapy meta不用pipe用命令-o

1.  spider代码:

# -*- coding: utf-8 -*-
import scrapy
from tencent1.items import Tencent1Item
import json
class Mytest1Spider(scrapy.Spider):
    name = 'tc1'
    start_urls = ['https://hr.tencent.com/position.php?lid=&tid=&keywords=python&start=0#a/']

    def parse(self, response):
        item = Tencent1Item()
        tr = response.xpath("//tr[@class='even']|//tr[@class='odd']")
        for i in tr:
            item['job_name']=i.xpath('./td[1]/a/text()').extract_first()
            item['job_type'] = i.xpath('./td[2]/text()').extract_first()
            item['job_num'] = i.xpath('./td[3]/text()').extract_first()
            item['job_place'] = i.xpath('./td[4]/text()').extract_first()
            item['job_time'] = i.xpath('./td[5]/text()').extract_first()
            # print(item)
            url1 = i.xpath('./td[1]/a/@href').extract_first()
            url1 = 'https://hr.tencent.com/{}'.format(url1)
            yield scrapy.Request(url=url1,meta={'job_item':item},callback=self.parse_detail)
        # #下一页网址
        # url_next = response.xpath('//a[@id = "next"]/@href').extract_first()
        # if '50'in url_next:
        #     return
        # url_next = 'https://hr.tencent.com/{}'.format(url_next)
        # print(url_next)
        # yield scrapy.Request(url_next)

    def parse_detail(self,response):
        item = response.meta['job_item']
        data = response.xpath('//ul[@class="squareli"]/li/text()').extract()
        item['job_detail'] = '\n'.join(data)
        return item

2.  items代码:

import scrapy


class Tencent1Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    job_name = scrapy.Field()
    job_type = scrapy.Field()
    job_num = scrapy.Field()
    job_place = scrapy.Field()
    job_time = scrapy.Field()
    job_detail = scrapy.Field()

3.  命令,(job.jl 是文件名字)

  

 

  

转载于:https://www.cnblogs.com/cxhzy/p/10356773.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值