利用xpath()分析抓取数据还是比较简单的,只是网址的跳转和递归等比较麻烦。耽误了好久,还是豆瓣好呀,URL那么的规范。唉,亚马逊URL乱七八糟的.... 可能对url理解还不够.
amazon
├── amazon
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── items.py
│ ├── items.pyc
│ ├── msic
│ │ ├── __init__.py
│ │ └── pad_urls.py
│ ├── pipelines.py
│ ├── settings.py
│ ├── settings.pyc
│ └── spiders
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── pad_spider.py
│ └── pad_spider.pyc
├── pad.xml
└── scrapy.cfg
(1)items.py
from scrapy import Item, Field
class PadItem(Item):
sno = Field()
price = Field()
(2)pad_spider.py
# -*- coding: utf-8 -*-
from scrapy import Spider, Selector
from scrapy.http import Request
from amazon