python爬取天猫_python scrapy 爬取天猫商品

最新推荐文章于 2021-03-10 23:31:50 发布

weixin_39668890

最新推荐文章于 2021-03-10 23:31:50 发布

阅读量388

点赞数

文章标签： python爬取天猫

感觉写的差不多了，可就是爬不出数据，想要用这个爬取天猫商品销量价格，求一下PYTHON大神，能让我程序爬成功的，1000分都给你itemsimportscrapyclassno1item(scrapy.Item):name=scra...

感觉写的差不多了，可就是爬不出数据，想要用这个爬取天猫商品销量价格，求一下PYTHON大神，能让我程序爬成功的，1000分都给你

items

import scrapy

class no1item(scrapy.Item):

name = scrapy.Field()

count = scrapy.Field()

price= scrapy.Field()

主爬虫程序

import scrapy

from scrapy.contrib.spiders import CrawlSpider, Rule

from scrapy.contrib.linkextractors import LinkExtractor

from no1.items import no1item

class no1spiderSpider(CrawlSpider):

name = 'no1'

allowed_domains = ['tmall.com']

start_urls = ['https://detail.tmall.com/item.htm?spm=a1z10.5-b.w4011-8594517117.128.6PROcq&id=520949195765&rn=23ce85bd58df59ed673bcac957de356a&abbucket=15&sku_properties=-1:-1']

download_delay = 10

rules = (

Rule(LinkExtractor(allow=('http://detail.tmall.com/item.htm\?id=\d+&rn=\w+&abbucket=\d+', ), deny=('subsection\.php', ))),

Rule(LinkExtractor(allow=('http://detail.tmall.com/item.htm\?id=\d+&rn=\w+&abbucket=\d+', )), callback='parse_item'),

)

def parse_item(self, response):

self.log('Hi, this is an item page! %s' % response.url)

item = no1item()

item['nanme'] = response.xpath('//title/text()').re(r'name: (\d+)')

item['price'] = response.xpath("//*[@id='J_PromoPrice']/dd/div/span/text()").extract()

item['count'] = response.xpath("//*[@id='J_DetailMeta']/div[1]/div[1]/div/ul/li[1]/div/span[2]/text()").extract()

return item

后面还有pipeline ，字数限制，没有pipeline也能生成JSON文件吧

展开

weixin_39668890

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬取天猫_python scrapy 爬取天猫商品

感觉写的差不多了，可就是爬不出数据，想要用这个爬取天猫商品销量价格，求一下PYTHON大神，能让我程序爬成功的，1000分都给你itemsimportscrapyclassno1item(scrapy.Item):name=scra...感觉写的差不多了，可就是爬不出数据，想要用这个爬取天猫商品销量价格，求一下PYTHON大神，能让我程序爬成功的，1000分都给你itemsimport s...
复制链接

扫一扫