由于最近开车听歌实在听腻了,就想听几段相声解解乏。但是各个音乐平台上相声实在是少,于是只能自己去网上找。这是最初版本的code,相当简陋,下载链接都是手动复制网页源代码,然后用正则切出来的。
from selenium import webdriver
from requests_html import HTMLSession
import requests
import time
import json
xs_list = [
"https://www.pingshu365.com/down/321313.html"]
driver = webdriver.Chrome()
lenxs = len(xs_list)
session = HTMLSession()
def xssSession(https):
r = session.get(https)
print(r.html)
def openDriver(https):
driver.get(https)
def openXss(http):
global a
a += 1
print(a / lenxs)
openDriver(http)
time.sleep(1)
handle = driver.current_window_handle # 获取当前标签句柄
txt = driver.find_element_by_xpath('/html/body/div[6]/div[1]/div[3]/font').text
filename = txt.split('- 下载')[0]
print(filename)
bt = driver.find_element_by_id('clickina').click()
handles = driver.window_handles # 获取当前所有标签句柄
for newHand in handles: # 对标签进行遍历
if newHand != handle: # 筛选新打开的标签
driver.switch_to.window(newHand) # 切换到新打开的标签
link = driver.current_url # 获取当前页面地址
print(link)
driver.close()
driver.switch_to.window(handles[0])
myfile = requests.get(link)
open(f'E:\pydownload\\{filename}.mp3', 'wb').write(myfile.content)
for xss in xs_list:
openXss(xss)
这段代码可以实现自动打开xs_list里面的下载链接,然后自动下载。最开始的代码是爬不到相声名称的,后面想了想可以直接在网页上抓text文本当相声标题。但是还有一个问题,就是下载评书的时候,不会按照顺序播放。因为标题本身不带序号,会乱序播放。