xpath里面的 / / 与 . / 和 .//

xpath定位元素的时候有//和./方式,//方式是定位整个页面文档中所有符合的元素,而 ./ 是在当前节点下面进行选择, .//方式也会将操作限制到当前节点

问题:

当我用xpath选择所有class = “quote”的div时,共选择到如下图的共10个,但是我用其中
一个如第一个进行进一步提取里面的文本时,如

>>>quotes = response.xpath('//div[@class = "quote"]')
>>>quotes[0].xpath('//span[1]/text()').extract()
其结果会把所有标签的文本都给提取出来,不是我想要的只提取这一个标签下面的某个文本,详细见下
而当我使用./当前节点下进行选择才能够提取到一个
>>>quotes[0].xpath('./span[1]/text()').extract()
>>>['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']

问题是虽然 // 是选取所有符合条件的标签,但是我指定了只在这个div(quotes[0])下面进行选择呀,打印出来也确实只有这一个div,没有其余的,可是为什么

>>>quotes[0].xpath('//span[1]/text()').extract()

这样的写法(//)会提取出整个页面的所有符合条件的结果??

同样换用css选择器进行提取便没有这个问题。

而后在scrapy的一份文档里看到了解释:
在这里插入图片描述
即:使用 .// 或者 ./ 这两种形式

详细如下:

目标整体结构:
在这里插入图片描述

第一个div,后面每一个div结构和这个一样
    <div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world" /    >            
            <a class="tag" href="/tag/change/page/1/">change</a>            
            <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>           
            <a class="tag" href="/tag/thinking/page/1/">thinking</a>            
            <a class="tag" href="/tag/world/page/1/">world</a>
        </div>
    </div>

xpath选择每个div,返回了选择到的类sellectorlist实例
>>> quotes = response.xpath('//div[@class = "quote"]')
>>> quotes
[<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>]
>>>
>>> len(quotes)
10
>>> quotes[0]		#结果也只有一条
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>
>>> print(quotes[0].extract())		#打印出来也只有这一个,可是
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world">
            <a class="tag" href="/tag/change/page/1/">change</a>
            <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
            <a class="tag" href="/tag/thinking/page/1/">thinking</a>
            <a class="tag" href="/tag/world/page/1/">world</a>
        </div>
    </div>
    
# 可是为什么提取quotes[0]里面的这个span[1]下面的文本会把真个页面的全部提取了?
# 若有幸被哪位前辈看到请一定帮我讲一哈嘛
    >>> quotes[0].xpath('//span[1]/text()').extract()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 
'“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 
'“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 
'“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”',
 "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
 '“Try not to become a man of success. Rather become a man of value.”', '“It is better to be hated for what you are than to be loved for what you are not.”',
 "“I have not failed. I've just found 10,000 ways that won't work.”", "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”", 
 '“A day without sunshine is like, you know, night.”', '→', '\n            ', '\n            ', '❤']


# 用 ./ (在当前节点)下面提取就只是本结点的text,或者使用  .// 这种。表示相对路径下面查找,而不是说在真个文档下面查找
>>> quotes[0].xpath('./span[1]/text()').extract()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
>>> quotes[1].xpath('./span[1]/text()').extract()
['“It is our choices, Harry, that show what we truly are, far more than our abilities.”']
>>>
  • 15
    点赞
  • 44
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值