xpath引入正则
[re:match(text(), '处理正则')]
add_xpath('bookstock',"//*[re:match(text(),'stock \([0-9].')]/text()")
双向数据提取
横向:从一个索引页到另一个索引页(水平爬取)
纵向:从一个索引页到数据详细页并抽取item(垂直爬取)
代码
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
from ..items import ToscrapebookItem
from scrapy.loader import ItemLoader
class BooksSpider(scrapy.Spider):
name = 'books2'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com/']
def parse(self, response):
book_urls = response