53-selenium 获取所有子节点下全部内容（selenium的xpath结合etree）

最新推荐文章于 2024-05-08 17:15:01 发布

ystraw_ah

最新推荐文章于 2024-05-08 17:15:01 发布

阅读量1.5w

点赞数 3

分类专栏： python 文章标签： selenium python

本文链接：https://blog.csdn.net/qq_39451578/article/details/104142215

版权

python 专栏收录该内容

73 篇文章 1 订阅

订阅专栏

例如，需要获取，1年前项目发起这个文本内容。

首先，我们这个是通过selenium定位标签的，然后需要提取内容，虽然是同样是通过xpath定位的，但是提取文档时，却并不能直接使用xpath中的text()或者string()方法来获取文档，所以现在的思路就是，通过xpath定位到标签，然后通过etree来提取。值得注意的是构建时，需要利用如下函数：（当用driver使用get_attribute时,获取到的是整个column标签下面所有的html,是字符串格式----不对etree对象有用）

driver.find_element_by_class_name('column').get_attribute(
    'innerHTML')

附上一个片段：

# 项目发起时间：
startTime = browser.find_element_by_xpath('//span[contains(text(),"项目发起")]/parent::div')
content = etree.HTML(startTime.get_attribute('innerHTML')).xpath('string(.)').replace('\n', ',').replace(' ', '')
print(content)
# 如下也可以直接拿取：
print(startTime.get_attribute('outerText'))

附上一个例子：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from lxml import etree

#下面代码主要是让selenium使用无界面的chrome浏览器
chrome_options = Options()    
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get(url)
#当用driver使用get_attribute时,获取到的是整个column标签下面所有的html,是字符串格式----不对etree对象有用
column = driver.find_element_by_class_name('column').get_attribute(
    'innerHTML') 
html = etree.HTML(column)   #使用etree变成lxml格式
html.xpath('//li[@class="first_f"]//div[@class="msg"]/a[2]/text()')
# 获取到的值是文本['内容'],列表格式的字符串()

如果要获取标签里面的html,
details = html.xpath('//li[@id="{}"]//div[@class="reply_c"]/p'.format(reply_id))[0]
details = etree.tostring(details, encoding="utf-8", pretty_print=True, method="html") # 获取到的是标签的html,是byte类型
details = details.decode('utf-8') if details else None  　　　　　　# 返回字符串格式

ystraw_ah

关注

3
点赞
踩
16

收藏

觉得还不错? 一键收藏
打赏
0
评论
53-selenium 获取所有子节点下全部内容（selenium的xpath结合etree）

例如，需要获取，1年前项目发起这个文本内容。首先，我们这个是通过selenium定位标签的，然后需要提取内容，虽然是同样是通过xpath定位的，但是提取文档时，却并不能直接使用xpath中的text()或者string()方法来获取文档，所以现在的思路就是，通过xpath定位到标签，然后通过etree来提取。值得注意的是构建时，需要利用如下函数：（当用driver使用get_attribu...
复制链接

扫一扫