Element div at 0x35686c0

最新推荐文章于 2024-06-09 11:01:50 发布

yxx_an

最新推荐文章于 2024-06-09 11:01:50 发布

阅读量545

点赞数 1

文章标签： python html 爬虫

本文链接：https://blog.csdn.net/weixin_57178733/article/details/127026387

版权

在普通py爬虫使用xpath获取对应内容时，有时候会出现只获取到了节点，尤其是存入文件的时候，而没有获取到节点内容的情况，需要对获得的节点进行处理。

如下

request=urllib.request.Request(url=url,headers=headers)
response=urllib.request.urlopen(request)

content=response.read().decode('utf-8')
print(content)

tree=etree.HTML(content)
tagneed=tree.xpath('//div[@class="content"]')[0]
print(type(tagneed))
#通过tostring来处理
result = etree.tostring(tagneed, encoding='utf-8').decode()
with open('pengcheng.html','w',encoding='utf-8') as fp:
    fp.write(result)