爬取微信小程序

最新推荐文章于 2024-07-12 16:48:06 发布

weixin_45197326

最新推荐文章于 2024-07-12 16:48:06 发布

阅读量672

点赞数

文章标签： xpath

本文链接：https://blog.csdn.net/weixin_45197326/article/details/105832829

版权

-- coding: utf-8 --

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class WxSpider(CrawlSpider):
name = ‘wx’
allowed_domains = [‘wxapp-union.com’]
start_urls = [‘http://www.wxapp-union.com/portal.php?mod=list&catid=1&page=1’]

rules = (
    #列表页
    Rule(LinkExtractor(allow=r'http://www.wxapp-union.com/portal.php\?mod=list&catid=1&page=\d+'), follow=True),
    #详情页
    Rule(LinkExtractor(allow=r'http://www.wxapp-union.com/article-\d+-1.html'), callback='parse_item')
)

def parse_item(self, response):
    item = {}
    #item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
    #item['name'] = response.

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_45197326

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬取微信小程序

-- coding: utf-8 --import scrapyfrom scrapy.linkextractors import LinkExtractorfrom scrapy.spiders import CrawlSpider, Ruleclass WxSpider(CrawlSpider):name = ‘wx’allowed_domains = [‘wxapp-un...
复制链接

扫一扫