xpath里面的 / / 与 . / 和 .//

最新推荐文章于 2024-05-05 09:02:59 发布

qq_41422774

最新推荐文章于 2024-05-05 09:02:59 发布

阅读量1w

点赞数 15

分类专栏： python 文章标签： python xpath

本文链接：https://blog.csdn.net/qq_41422774/article/details/99303400

版权

python 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

xpath定位元素的时候有//和./方式，//方式是定位整个页面文档中所有符合的元素，而 ./ 是在当前节点下面进行选择， .//方式也会将操作限制到当前节点

问题：

当我用xpath选择所有class = “quote”的div时，共选择到如下图的共10个，但是我用其中
一个如第一个进行进一步提取里面的文本时，如

>>>quotes = response.xpath('//div[@class = "quote"]')
>>>quotes[0].xpath('//span[1]/text()').extract()
其结果会把所有标签的文本都给提取出来，不是我想要的只提取这一个标签下面的某个文本，详细见下
而当我使用./当前节点下进行选择才能够提取到一个
>>>quotes[0].xpath('./span[1]/text()').extract()
>>>['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']

问题是虽然 // 是选取所有符合条件的标签，但是我指定了只在这个div（quotes[0]）下面进行选择呀，打印出来也确实只有这一个div，没有其余的，可是为什么

>>>quotes[0].xpath('//span[1]/text()').extract()

这样的写法（//）会提取出整个页面的所有符合条件的结果？？

同样换用css选择器进行提取便没有这个问题。

而后在scrapy的一份文档里看到了解释：
在这里插入图片描述
即：使用 .// 或者 ./ 这两种形式

详细如下：

目标整体结构：
在这里插入图片描述

第一个div，后面每一个div结构和这个一样
    <div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world" /    >            
            <a class="tag" href="/tag/change/page/1/">change</a>            
            <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>           
            <a class="tag" href="/tag/thinking/page/1/">thinking</a>            
            <a class="tag" href="/tag/world/page/1/">world</a>
        </div>
    </div>

xpath选择每个div，返回了选择到的类sellectorlist实例
>>> quotes = response.xpath('//div[@class = "quote"]')
>>> quotes
[<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>,
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>, 
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>]
>>>
>>> len(quotes)
10
>>> quotes[0]		#结果也只有一条
<Selector xpath='//div[@class = "quote"]' data='<div class="quote" itemscope itemtype...'>
>>> print(quotes[0].extract())		#打印出来也只有这一个，可是
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world">
            <a class="tag" href="/tag/change/page/1/">change</a>
            <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
            <a class="tag" href="/tag/thinking/page/1/">thinking</a>
            <a class="tag" href="/tag/world/page/1/">world</a>
        </div>
    </div>
    
# 可是为什么提取quotes[0]里面的这个span[1]下面的文本会把真个页面的全部提取了？
# 若有幸被哪位前辈看到请一定帮我讲一哈嘛
    >>> quotes[0].xpath('//span[1]/text()').extract()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 
'“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 
'“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 
'“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”',
 "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
 '“Try not to become a man of success. Rather become a man of value.”', '“It is better to be hated for what you are than to be loved for what you are not.”',
 "“I have not failed. I've just found 10,000 ways that won't work.”", "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”", 
 '“A day without sunshine is like, you know, night.”', '→', '\n            ', '\n            ', '❤']


# 用 ./ （在当前节点）下面提取就只是本结点的text，或者使用  .// 这种。表示相对路径下面查找，而不是说在真个文档下面查找
>>> quotes[0].xpath('./span[1]/text()').extract()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
>>> quotes[1].xpath('./span[1]/text()').extract()
['“It is our choices, Harry, that show what we truly are, far more than our abilities.”']
>>>

qq_41422774

关注

15
点赞
踩
44

收藏

觉得还不错? 一键收藏
1
评论
xpath里面的 / / 与 . / 和 .//

xpath定位元素的时候有//和./方式，//方式是定位整个页面中所有符合的元素，而 ./ 是在当前节点下面进行选择，但我仍有一事不明。。。问题：当我用xpath选择所有class = “quote”的div时，共选择到如下图的共10个，但是我用其中一个如第一个进行进一步提取里面的文本时，如>>>quotes = response.xpath('//div[@class ...
复制链接

扫一扫