HTML标签的get方法

最新推荐文章于 2024-09-20 13:07:11 发布

镇长1998

最新推荐文章于 2024-09-20 13:07:11 发布

阅读量6.7k

点赞数 1

分类专栏：爬虫新发现

本文链接：https://blog.csdn.net/weixin_41514525/article/details/87131401

版权

新发现同时被 2 个专栏收录

31 篇文章 0 订阅

订阅专栏

爬虫

13 篇文章 0 订阅

订阅专栏

1. 获取标签属性的时候，可以先获取整个标签，再利用get()方法获取标签的属性

import requests
from lxml import etree
html=requests.get('https://www.w3cschool.cn/').text
html=etree.HTML(html)
res=html.xpath('//li//a')    #获得li标签
for re in res:
    print(re.get('href'))  #get()方法获得各li标签的href属性

这样写的好处是:如果一次需要提取多个属性的时候，多次提取的时候比较方便。

当get()获取的标签属性在标签中并不存的时候，返回None,并不抛出异常。这还挺好的。。

etree.tostring(标签)函数:打印每个标签的具体内容

html = etree.HTML(res)
result = etree.tostring(html)

2.在selenium webdriver 中，

from selenium import webdriver

driver=webdriver.Chrome()

driver.get('https://music.163.com/')

driver.switch_to_frame('contentFrame')

htmllist=driver.find_elements_by_xpath('//p[@class="dec"]//a')

songlist_list=[]
songlist_list_name=[]

for item in htmllist:
    songlist_list.insert(0,item.get_attribute('href'))  #用get_attribute获取标签属性
    songlist_list_name.insert(0,item.get_attribute('textContent').strip())  #获取标签的文本
内容

在selenium webdriver 中获取标签的属性需要get_attribute()函数来获取，不可再用get()方法了。