etree xpath处理中文乱码问题解决

最新推荐文章于 2023-10-13 18:49:18 发布

Memory_and_Dream

最新推荐文章于 2023-10-13 18:49:18 发布

阅读量2.3k

点赞数 2

分类专栏： python爬虫

本文链接：https://blog.csdn.net/Memory_and_Dream/article/details/108316833

版权

python爬虫专栏收录该内容

4 篇文章 1 订阅

订阅专栏

不知道为啥突然碰到一个页面etree xpath获取到的中文是乱码。最后靠加HTMLParser参数搞定。代码如下

    @staticmethod
    def getXpath(xpath, content):  
        hparser = etree.HTMLParser(encoding='utf-8')

        tree = etree.HTML(content,hparser)
        out = []
        results = tree.xpath(xpath)
        for result in results:
            if  'ElementStringResult' in str(type(result)) or 'ElementUnicodeResult' in str(type(result)) :
                out.append(result)
            else:
                out.append(etree.tostring(result))
        return out

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Memory_and_Dream

关注关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
etree xpath处理中文乱码问题解决

不知道为啥突然碰到一个页面etree xpath获取到的中文是乱码。最后靠加HTMLParser参数搞定。代码如下 @staticmethod def getXpath(xpath, content): hparser = etree.HTMLParser(encoding='utf-8') tree = etree.HTML(content,hparser) out = [] results = tree.xpath(x
复制链接

扫一扫