有时对于B站很多集的技术视频,可以通过xpath分析元素,打印出来作为目录使用,以下为参考代码:
功能:
1、解析xml元素,逐行打印;
2、记录当前集时长,播放总时长;
3、前面留有标志位,可以自行打勾作为提醒
xml提取:
将目录元素手工保存至文本文件中,如 bilibili_video_lst.xml
保存 <div class="cur-list"> ...</div> 元素即可:
import requests
from lxml import etree
def get_content():
video_xml_path = "my_useful_tools/data/bilibili_video_lst.xml"
with open(video_xml_path, encoding="utf-8") as f:
xml_str = f.read()
all_durtime = {
"hour": 0,
"min": 0,
"sec": 0
}
xml_root_data = etree.HTML(xml_str)
for li_xml_data in xml_root_data.xpath('//div/ul/li'):
# print(li_xml_data)
cur_durtime = {"hour": 0, "min": 0, "sec": 0}
title = li_xml_data.xpath("a/@title")[0]
durtime = li_xml_data.xpath('a/div/div[@class="duration"]/text()')[0]
hour = 0
mins = 0
sec = 0
if len(durtime.split(":")) == 2:
mins, sec = [int(i) for i in durtime.split(":")]
else:
hour, mins, sec = [int(i) for i in durtime.split(":")]
if (all_durtime["sec"] + sec) >= 60:
mins += 1
all_durtime["sec"] = all_durtime["sec"] + sec - 60
else:
all_durtime["sec"] = all_durtime["sec"] + sec
if (all_durtime["min"] + mins) >= 60:
hour += 1
all_durtime["min"] = all_durtime["min"] + mins - 60
else:
all_durtime["min"] = all_durtime["min"] + mins
all_durtime["hour"] += hour
iter_durtime = str(all_durtime["hour"]).zfill(2) + ":" + str(all_durtime["min"]).zfill(2)
print("[_] [_] [_]",iter_durtime, durtime + '\t' + title)
if __name__ == '__main__':
get_content()
print('THE END.')
结果示例:
[_] [_] [_] 00:16 16:41 day01_01.离线批计算与实时流失计算的理解(核心要点:数据流的哲学观)
[_] [_] [_] 01:06 49:19 day01_02.flink的基本概念、运行时架构及重要特性
[_] [_] [_] 02:00 54:03 day01_03.flink入门程序编写示例(从socket数据源做wordcount)
[_] [_] [_] 02:18 18:29 day01_04.flink入门程序编写示例(scala版本的wordcount)
[_] [_] [_] 02:47 28:59 day02_01.flink入门编程-批计算编程示例