Scrapy框架(二)

Selector选择器

基于lxml构建出来。

1.使用

直接导入模块,然后实例化使用。Selector支持css,re,xpath等解析方式

from scrapy import  Selector
content = "<html><head><title>My html</title><body><h3>Hello world!</h3></body></head></html>"
selector = Selector(text=content)
print(selector.xpath("//title/text()").get())
print(selector.xpath("//title/text()").getall())

 2.用Xpath爬取淘宝特色市场商品分类

爬虫出现Forbidden by robots.txt_菜鸡瞎讲-CSDN博客

class TaobaoSpider(scrapy.Spider):
    name = 'taobao'
    allowed_domains = ['huodong.taobao.com/wow/tbhome/act/special-markets']
    start_urls = ['http://huodong.taobao.com/wow/tbhome/act/special-markets/']

    def parse(self, response):
        titles = response.xpath("//dl[@class='market-list']")
        for title in titles:
            print(title.xpath("./dt/text()").get())
            print("*"*50)
            alist = title.xpath(".//a")
            for a in alist:
                print(a.xpath("./@href").get(), end=":")
                print(a.xpath("./span[@class='market-list-title']/text()").get())
   

3.用css爬取 

class TaobaoSpider(scrapy.Spider):
    name = 'taobao'
    allowed_domains = ['huodong.taobao.com/wow/tbhome/act/special-markets']
    start_urls = ['http://huodong.taobao.com/wow/tbhome/act/special-markets/']

    def parse(self, response):
        titles = response.css("dl.market-lis")
        for title in titles:
            print(title.css("dt::text").get())
            print("*"*50)
            alist = title.css("a")
            for a in alist:
                print(a.css("::attr(href)").get(), end=":")
                print(a.css("span.market-list-title::text").get())
     

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值