感觉写的差不多了,可就是爬不出数据,想要用这个爬取天猫商品销量价格,求一下PYTHON大神,能让我程序爬成功的,1000分都给你itemsimportscrapyclassno1item(scrapy.Item):name=scra...
感觉 写的差不多了,可就是爬不出数据,想要用这个爬取天猫 商品 销量 价格,求一下PYTHON大神,能让我程序爬成功的,1000分都给你
items
import scrapy
class no1item(scrapy.Item):
name = scrapy.Field()
count = scrapy.Field()
price= scrapy.Field()
主爬虫程序
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from no1.items import no1item
class no1spiderSpider(CrawlSpider):
name = 'no1'
allowed_domains = ['tmall.com']
start_urls = ['https://detail.tmall.com/item.htm?spm=a1z10.5-b.w4011-8594517117.128.6PROcq&id=520949195765&rn=23ce85bd58df59ed673bcac957de356a&abbucket=15&sku_properties=-1:-1']
download_delay = 10
rules = (
Rule(LinkExtractor(allow=('http://detail.tmall.com/item.htm\?id=\d+&rn=\w+&abbucket=\d+', ), deny=('subsection\.php', ))),
Rule(LinkExtractor(allow=('http://detail.tmall.com/item.htm\?id=\d+&rn=\w+&abbucket=\d+', )), callback='parse_item'),
)
def parse_item(self, response):
self.log('Hi, this is an item page! %s' % response.url)
item = no1item()
item['nanme'] = response.xpath('//title/text()').re(r'name: (\d+)')
item['price'] = response.xpath("//*[@id='J_PromoPrice']/dd/div/span/text()").extract()
item['count'] = response.xpath("//*[@id='J_DetailMeta']/div[1]/div[1]/div/ul/li[1]/div/span[2]/text()").extract()
return item
后面还有pipeline ,字数限制,没有pipeline也能生成JSON文件吧
展开