前言
最近需要大规模下载B站视频,同时要将下载好的视频用BV号进行重命名,最后上传至服务器。这个工作一开始我是完全手工完成的,通过游猴来下载,可是下载几十个视频还好,再多一点的话真是太烦了,而且生产力低下,因此诞生了编写脚本的想法。
一开始我需要在B站搜索关键词,然后不断点开视频后进行下载,同时在视频下载后还需要找到这个视频来修改BV号,效率实在太低,特别是当下载的视频多了,再返回来寻找它对应的BV号时也是个很繁琐的过程,因此决定进行编写python脚本。
本次的脚本可以大幅度提高工作效率,但是它并不是全自动完成任务的,毕竟我们用到了Bilibili唧唧。(唧唧真的很好用,其实也可以完全做成全自动,但我觉得没什么必要了,效率已经很高啦~)
大家如果还有什么更好的建议欢迎评论告诉我。
最后,给个赞吧,亲~
很多人学习python,不知道从何学起。
很多人学习python,掌握了基本语法过后,不知道在哪里寻找案例上手。
很多已经做案例的人,却不知道如何去学习更加高深的知识。
那么针对这三类人,我给大家提供一个好的学习平台,免费领取视频教程,电子书籍,以及课程的源代码!
QQ群:101677771
概述
简要介绍一下工作流程:
(1)通过爬虫爬取一堆视频BV号,存放于txt文件中,如下所示:
(2)不断复制BV号,唧唧便会自动进行下载视频
(3)通过脚本将下载好的视频一键化改名
正文
1.依赖库
- requests
- lxml
2.代码
相关讲解已在注释标注。
<span style="color:#000000"><code class="language-python"><span style="color:#669900">'''
author:Ericam
description: 用于爬取b站视频链接
'''</span>
<span style="color:#c678dd">import</span> requests
<span style="color:#c678dd">import</span> re
<span style="color:#c678dd">from</span> lxml <span style="color:#c678dd">import</span> etree
<span style="color:#c678dd">import</span> time
<span style="color:#669900">'''
该函数用于解析爬取的网页。
提取出网页里视频的url链接以及对应的视频名。
'''</span>
<span style="color:#c678dd">def</span> <span style="color:#61aeee">getHref</span><span style="color:#999999">(</span>url<span style="color:#999999">,</span>page<span style="color:#999999">)</span><span style="color:#999999">:</span>
<span style="color:#c678dd">try</span><span style="color:#999999">:</span>
req <span style="color:#669900">=</span> requests<span style="color:#999999">.</span>get<span style="color:#999999">(</span>url<span style="color:#999999">,</span>timeout<span style="color:#669900">=</span><span style="color:#98c379">5</span><span style="color:#999999">,</span>headers<span style="color:#669900">=</span>headers<span style="color:#999999">)</span>
html <span style="color:#669900">=</span> req<span style="color:#999999">.</span>text
data <span style="color:#669900">=</span> etree<span style="color:#999999">.</span>HTML<span style="color:#999999">(</span>html<span style="color:#999999">)</span>
<span style="color:#669900">'''
page-1://*[@id="all-list"]/div[1]/div[2]/ul[@class="video-list"]/li
other://*[@id="all-list"]/div[1]/ul[@class="video-list"]/li
'''</span>
pattern <span style="color:#669900">=</span> <span style="color:#669900">'//*[@id="all-list"]/div[1]/div[2]/ul[contains(@class,"video-list")]/li'</span> <span style="color:#c678dd">if</span> page <span style="color:#669900">==</span> <span style="color:#98c379">1</span> <span style="color:#c678dd">else</span> <span style="color:#669900">'//*[@id="all-list"]/div[1]/ul[contains(@class,"video-list")]/li'</span>
vurlList <span style="color:#669900">=</span> data<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span>pattern<span style="color:#999999">)</span>
<span style="color:#c678dd">for</span> li <span style="color:#c678dd">in</span> vurlList<span style="color:#999999">:</span>
vurl <span style="color:#669900">=</span> li<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span><span style="color:#669900">".//a/attribute::href"</span><span style="color:#999999">)</span><span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
title <span style="color:#669900">=</span> li<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span><span style="color:#669900">".//a/attribute::title"</span><span style="color:#999999">)</span><span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
<span style="color:#c678dd">yield</span> vurl<span style="color:#999999">,</span>title
<span style="color:#c678dd">except</span><span style="color:#999999">:</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">'第%d页爬取失败'</span> <span style="color:#669900">%</span> page<span style="color:#999999">)</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">'Unfortunitely -- An Unknow Error Happened, Please wait 3 seconds'</span><span style="color:#999999">)</span>
time<span style="color:#999999">.</span>sleep<span style="color:#999999">(</span><span style="color:#98c379">3</span><span style="color:#999999">)</span>
<span style="color:#669900">'''
该函数用于正则提取,将url内的BV号提取出来
'''</span>
<span style="color:#c678dd">def</span> <span style="color:#61aeee">getBv</span><span style="color:#999999">(</span>href<span style="color:#999999">)</span><span style="color:#999999">:</span>
pattern <span style="color:#669900">=</span> re<span style="color:#999999">.</span><span style="color:#669900">compile</span><span style="color:#999999">(</span><span style="color:#669900">'(BV.*?)\?'</span><span style="color:#999999">)</span>
data <span style="color:#669900">=</span> re<span style="color:#999999">.</span>search<span style="color:#999999">(</span>pattern<span style="color:#999999">,</span>href<span style="color:#999999">)</span>
<span style="color:#c678dd">if</span> data <span style="color:#669900">==</span> <span style="color:#56b6c2">None</span><span style="color:#999999">:</span>
<span style="color:#c678dd">return</span> <span style="color:#669900">''</span>
<span style="color:#c678dd">return</span> data<span style="color:#999999">.</span>group<span style="color:#999999">(</span><span style="color:#98c379">1</span><span style="color:#999999">)</span>
<span style="color:#c678dd">if</span> __name__ <span style="color:#669900">==</span> <span style="color:#669900">"__main__"</span><span style="color:#999999">:</span>
<span style="color:#5c6370">#头部伪装</span>
headers <span style="color:#669900">=</span> <span style="color:#999999">{</span>
<span style="color:#669900">'User-Agent'</span><span style="color:#999999">:</span><span style="color:#669900">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'</span>
<span style="color:#999999">}</span>
hrefList <span style="color:#669900">=</span> <span style="color:#999999">[</span><span style="color:#999999">]</span>
titleList <span style="color:#669900">=</span> <span style="color:#999999">[</span><span style="color:#999999">]</span>
<span style="color:#5c6370">#需要爬取多少页,自行进行修改,本代码测试1~2页</span>
<span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#98c379">1</span><span style="color:#999999">,</span><span style="color:#98c379">3</span><span style="color:#999999">)</span><span style="color:#999999">:</span>
url <span style="color:#669900">=</span> <span style="color:#669900">"https://search.bilibili.com/all?keyword=歪嘴战神&page={0}"</span><span style="color:#999999">.</span><span style="color:#669900">format</span><span style="color:#999999">(</span>i<span style="color:#999999">)</span> <span style="color:#5c6370">#修改keyword后的关键字即可</span>
l <span style="color:#669900">=</span> getHref<span style="color:#999999">(</span>url<span style="color:#999999">,</span>i<span style="color:#999999">)</span>
<span style="color:#c678dd">for</span> vurl<span style="color:#999999">,</span>title <span style="color:#c678dd">in</span> l<span style="color:#999999">:</span>
hrefList<span style="color:#999999">.</span>append<span style="color:#999999">(</span>vurl<span style="color:#999999">)</span>
titleList<span style="color:#999999">.</span>append<span style="color:#999999">(</span>title<span style="color:#999999">)</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"第{0}页爬取结束"</span><span style="color:#999999">.</span><span style="color:#669900">format</span><span style="color:#999999">(</span>i<span style="color:#999999">)</span><span style="color:#999999">)</span>
time<span style="color:#999999">.</span>sleep<span style="color:#999999">(</span><span style="color:#98c379">2</span><span style="color:#999999">)</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"---------------------------开始截取BV号-----------------------------"</span><span style="color:#999999">)</span>
<span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#669900">len</span><span style="color:#999999">(</span>hrefList<span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">:</span>
hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span> <span style="color:#669900">=</span> getBv<span style="color:#999999">(</span>hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#999999">)</span>
<span style="color:#c678dd">with</span> <span style="color:#669900">open</span><span style="color:#999999">(</span><span style="color:#669900">"bv.txt"</span><span style="color:#999999">,</span><span style="color:#669900">'w'</span><span style="color:#999999">,</span>encoding<span style="color:#669900">=</span><span style="color:#669900">'utf-8'</span><span style="color:#999999">)</span> <span style="color:#c678dd">as</span> f<span style="color:#999999">:</span>
<span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#669900">len</span><span style="color:#999999">(</span>hrefList<span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">:</span>
f<span style="color:#999999">.</span>write<span style="color:#999999">(</span>hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">"\t"</span><span style="color:#669900">+</span>titleList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">"\n"</span><span style="color:#999999">)</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"爬取结束"</span><span style="color:#999999">)</span>
</code></span>
3.爬取结果
给出唧唧的链接,唧唧,很好用的小工具。
我们只需要将刚才爬取好的链接放在一边,不断复制BV号,然后唧唧进行下载即可。
唧唧下载好的视频如下所示:
为什么需要将它们进行改名呢,因为如果当视频数量越来越多时,比如几千几万时,通过名字便会越来越难以管理,同时也难以进行去重,很大概率会不断下载重复的视频。
在B站,BV号便是每个视频的“身份证”(主键),因此用其进行视频命名可以方便日后管理,同时也方便进行去重。
代码
<span style="color:#000000"><code class="language-python"><span style="color:#669900">'''
author:Ericam
description: 用于将下载下来的b站视频重命名,命名格式为bv号
'''</span>
<span style="color:#c678dd">import</span> os
<span style="color:#c678dd">import</span> difflib
<span style="color:#c678dd">if</span> __name__ <span style="color:#669900">==</span> <span style="color:#669900">'__main__'</span><span style="color:#999999">:</span>
bvpath <span style="color:#669900">=</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>join<span style="color:#999999">(</span><span style="color:#669900">"D:/"</span><span style="color:#999999">,</span><span style="color:#669900">"Coding"</span><span style="color:#999999">,</span><span style="color:#669900">"python"</span><span style="color:#999999">,</span><span style="color:#669900">"Python爬虫"</span><span style="color:#999999">)</span>
os<span style="color:#999999">.</span>chdir<span style="color:#999999">(</span>bvpath<span style="color:#999999">)</span>
d <span style="color:#669900">=</span> <span style="color:#999999">{</span><span style="color:#999999">}</span>
<span style="color:#669900">'''
bvdownload.txt里存放bv号与title名
若之前爬虫爬取了几千个,而唧唧只下载了几百个,便可以将这些已下载的bv和title复制到
bvdownload.txt中,将已下载的视频进行改名
'''</span>
<span style="color:#c678dd">with</span> <span style="color:#669900">open</span><span style="color:#999999">(</span><span style="color:#669900">"bvdownload.txt"</span><span style="color:#999999">,</span><span style="color:#669900">'r'</span><span style="color:#999999">,</span>encoding<span style="color:#669900">=</span><span style="color:#669900">'utf-8'</span><span style="color:#999999">)</span><span style="color:#c678dd">as</span> f<span style="color:#999999">:</span>
lines <span style="color:#669900">=</span> f<span style="color:#999999">.</span>readlines<span style="color:#999999">(</span><span style="color:#999999">)</span>
<span style="color:#c678dd">for</span> val <span style="color:#c678dd">in</span> lines<span style="color:#999999">:</span>
val <span style="color:#669900">=</span> val<span style="color:#999999">.</span>strip<span style="color:#999999">(</span><span style="color:#669900">"\n"</span><span style="color:#999999">)</span>
data <span style="color:#669900">=</span> val<span style="color:#999999">.</span>split<span style="color:#999999">(</span><span style="color:#669900">"\t"</span><span style="color:#999999">)</span>
bv <span style="color:#669900">=</span> data<span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
title <span style="color:#669900">=</span> data<span style="color:#999999">[</span><span style="color:#98c379">1</span><span style="color:#999999">]</span>
d<span style="color:#999999">[</span>title<span style="color:#999999">]</span> <span style="color:#669900">=</span> bv
<span style="color:#5c6370">#视频存放位置</span>
path <span style="color:#669900">=</span> <span style="color:#669900">'F:/bilibili视频/'</span>
os<span style="color:#999999">.</span>chdir<span style="color:#999999">(</span>path<span style="color:#999999">)</span>
videoList <span style="color:#669900">=</span> os<span style="color:#999999">.</span>listdir<span style="color:#999999">(</span><span style="color:#999999">)</span>
<span style="color:#5c6370">#开始进行模糊匹配</span>
<span style="color:#c678dd">for</span> key <span style="color:#c678dd">in</span> d<span style="color:#999999">:</span>
video <span style="color:#669900">=</span> difflib<span style="color:#999999">.</span>get_close_matches<span style="color:#999999">(</span>key<span style="color:#999999">,</span>videoList<span style="color:#999999">,</span><span style="color:#98c379">1</span><span style="color:#999999">,</span> cutoff<span style="color:#669900">=</span><span style="color:#98c379">0.3</span><span style="color:#999999">)</span>
<span style="color:#c678dd">if</span> <span style="color:#669900">len</span><span style="color:#999999">(</span>video<span style="color:#999999">)</span> <span style="color:#669900">==</span> <span style="color:#98c379">0</span><span style="color:#999999">:</span>
<span style="color:#c678dd">continue</span>
video <span style="color:#669900">=</span> video<span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
<span style="color:#5c6370">#检查视频是否已存在,若存在则删除视频</span>
<span style="color:#c678dd">if</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>d<span style="color:#999999">[</span>key<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">".mp4"</span><span style="color:#999999">)</span> <span style="color:#669900">and</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>video<span style="color:#999999">)</span><span style="color:#999999">:</span>
os<span style="color:#999999">.</span>remove<span style="color:#999999">(</span>video<span style="color:#999999">)</span>
<span style="color:#c678dd">else</span><span style="color:#999999">:</span>
<span style="color:#c678dd">if</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>video<span style="color:#999999">)</span><span style="color:#999999">:</span>
os<span style="color:#999999">.</span>rename<span style="color:#999999">(</span>video<span style="color:#999999">,</span>d<span style="color:#999999">[</span>key<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">".mp4"</span><span style="color:#999999">)</span>
<span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"重命名完成!"</span><span style="color:#999999">)</span>
</code></span>
结果演示
重命名完成的视频列表如下:
尾言
希望大家可以举一反三,提高自己的生产劳动力。