刚接触Python,学习爬虫时想要自定义Item并引用,自定义Item如下
import scrapy
class MyItem(scrapy.Item):
title = scrapy.Field()
pass
爬虫文件中使用,代码如下:
# -*- coding: utf-8 -*-
import scrapy
from tutorial.items.MyItem import MyItem
class MySpider(scrapy.Spider):
name = 'myitem.demo'
allowed_domains = ['toscrape.com']
def start_requests(self):
yield scrapy.Request('http://toscrape.com/tag/humor/', self.parse)
def parse(self, response):
for h1 in response.xpath('//h1').getall():
yield MyItem(title=h1).print_item()
for href in response.xpath('//a/@href').getall():
yield scrapy.Request(response.urljoin(href), self.parse)
这里,tutorial.item.MyItem 需要写到类文件名