Scrapy框架中selector.css方法和selector.xpath方法,如何获取标签属性(含text文本)的三种方法(scrapy1.6版本)

 

   text = '''<ul>
        <li class="toctree-l1"><a class="reference internal" href="intro/overview.html">Scrapy at a glance</a></li>
        <li class="toctree-l1"><a class="reference internal" href="intro/install.html">Installation guide</a></li>
        <li class="toctree-l1"><a class="reference internal" href="intro/tutorial.html">Scrapy Tutorial</a></li>
        <li class="toctree-l1"><a class="reference internal" href="intro/examples.html">Examples</a></li>
        </ul>
        <p class="caption"><span class="caption-text">Basic concepts</span></p>
        <ul>
        <li class="toctree-l1"><a class="reference internal" href="topics/commands.html">Command line tool</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/spiders.html">Spiders</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/selectors.html">Selectors</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/items.html">Items</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/loaders.html">Item Loaders</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/shell.html">Scrapy shell</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/item-pipeline.html">Item Pipeline</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/feed-exports.html">Feed exports</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/request-response.html">Requests and Responses</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/link-extractors.html">Link Extractors</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/settings.html">Settings</a></li>
        <li class="toctree-l1"><a class="reference internal" href="topics/exceptions.html">Exceptions</a></li>
        </ul>

        '''

        sel = Selector(text=text)

        ul = sel.css("ul")

        # css方法通过标签::属性名的方式获取属性值
        href = ul.css('li a::text').getall()

        # xpath方法通过标签/@属性名的方式获取属性值
        href_xpath = ul.xpath('./li/a/@href').getall()

        # 支持python方式通过标签.attrib获取标签属性list列表方法
        python_attr = [a.attrib for a in ul.css('li a')]

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值