浅谈Xpath标签属性删除，转换为string，删除标签功能

最新推荐文章于 2023-05-17 10:09:22 发布

爱敲代码的Joker

最新推荐文章于 2023-05-17 10:09:22 发布

阅读量1.5k

点赞数

分类专栏： Python 爬虫文章标签： html xpath

本文链接：https://blog.csdn.net/weixin_44606644/article/details/108778569

版权

Python 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

爬虫

6 篇文章 0 订阅

订阅专栏

Xpath删除指定标签

# 过程：
#		1.匹配到指定标签
#		2.根据表属性删除
scripts = html.xpath('//script')
for s in scripts:
    s.getparent().remove(s)

Xpath删除指定标签属性

#过程：
#		1.匹配到指定标签
#		2.根据strip_attributes方法删除

#strip_attributes 该方法是lxml中etree下的方法，主要是针对标签属性做更改，源码如下：
def strip_attributes(tree_or_element, *attribute_names): # real signature unknown; restored from __doc__
    """
    strip_attributes(tree_or_element, *attribute_names)
    
        Delete all attributes with the provided attribute names from an
        Element (or ElementTree) and its descendants.
    
        Attribute names can contain wildcards as in `_Element.iter`.
    
        Example usage::
    
            strip_attributes(root_element,
                             'simpleattr',
                             '{http://some/ns}attrname',
                             '{http://other/ns}*')
    """
    pass
 
 """示例："""
 # 删除作者标签的href，a标签
 user = html.xpath('//*[@class="authorName"]')
 etree.strip_attributes(user[0], ["href"])
# 将a标签内的所有属性删除
etree.strip_attributes(user[0], "{}*")

Xpath替换标签属性值

# 替换指定标签属性值
# 查找img标签
imgs = html.xpath('//*[@class="contentMedia contentPadding"]/div/div/img')
for i in imgs:
    #  替换src属性值
    i.attrib['src'] = "要替换的值"

Xpath将etree转换后的页面再次转换为String

html_1 = requests.get(url).content.decode()
html = etree.HTML(html_1)
# 再次转换为String，tostring方法
html_str = etree.tostring(html, encoding="utf-8").decode("utf-8")
print(html_str)

后续不定期更新Xpath的非常用方法，谢谢阅读！！！！

爱敲代码的Joker

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
浅谈Xpath标签属性删除，转换为string，删除标签功能

Xpath删除指定标签# 过程：# 1.匹配到指定标签# 2.根据表属性删除scripts = html.xpath('//script')for s in scripts: s.getparent().remove(s)Xpath删除指定标签属性#过程：# 1.匹配到指定标签# 2.根据strip_attributes方法删除#strip_attributes 该方法是lxml中etree下的方法，主要是针对标签属性做更改，源码如下：def strip_a.
复制链接

扫一扫