Python爬虫大规模下载B站视频

前言

最近需要大规模下载B站视频,同时要将下载好的视频用BV号进行重命名,最后上传至服务器。这个工作一开始我是完全手工完成的,通过游猴来下载,可是下载几十个视频还好,再多一点的话真是太烦了,而且生产力低下,因此诞生了编写脚本的想法。
一开始我需要在B站搜索关键词,然后不断点开视频后进行下载,同时在视频下载后还需要找到这个视频来修改BV号,效率实在太低,特别是当下载的视频多了,再返回来寻找它对应的BV号时也是个很繁琐的过程,因此决定进行编写python脚本。
本次的脚本可以大幅度提高工作效率,但是它并不是全自动完成任务的,毕竟我们用到了Bilibili唧唧。(唧唧真的很好用,其实也可以完全做成全自动,但我觉得没什么必要了,效率已经很高啦~)
大家如果还有什么更好的建议欢迎评论告诉我。
最后,给个赞吧,亲~

 很多人学习python,不知道从何学起。
很多人学习python,掌握了基本语法过后,不知道在哪里寻找案例上手。
很多已经做案例的人,却不知道如何去学习更加高深的知识。
那么针对这三类人,我给大家提供一个好的学习平台,免费领取视频教程,电子书籍,以及课程的源代码!
QQ群:101677771

概述

简要介绍一下工作流程:
(1)通过爬虫爬取一堆视频BV号,存放于txt文件中,如下所示:
在这里插入图片描述
(2)不断复制BV号,唧唧便会自动进行下载视频

(3)通过脚本将下载好的视频一键化改名

在这里插入图片描述

正文

爬虫部分

1.依赖库

  • requests
  • lxml

2.代码
相关讲解已在注释标注。

<span style="color:#000000"><code class="language-python"><span style="color:#669900">'''
author:Ericam
description: 用于爬取b站视频链接
'''</span>
<span style="color:#c678dd">import</span> requests
<span style="color:#c678dd">import</span> re
<span style="color:#c678dd">from</span> lxml <span style="color:#c678dd">import</span> etree
<span style="color:#c678dd">import</span> time

<span style="color:#669900">'''
该函数用于解析爬取的网页。
提取出网页里视频的url链接以及对应的视频名。
'''</span>
<span style="color:#c678dd">def</span> <span style="color:#61aeee">getHref</span><span style="color:#999999">(</span>url<span style="color:#999999">,</span>page<span style="color:#999999">)</span><span style="color:#999999">:</span>
    <span style="color:#c678dd">try</span><span style="color:#999999">:</span>
        req <span style="color:#669900">=</span> requests<span style="color:#999999">.</span>get<span style="color:#999999">(</span>url<span style="color:#999999">,</span>timeout<span style="color:#669900">=</span><span style="color:#98c379">5</span><span style="color:#999999">,</span>headers<span style="color:#669900">=</span>headers<span style="color:#999999">)</span>
        html <span style="color:#669900">=</span> req<span style="color:#999999">.</span>text
        data <span style="color:#669900">=</span> etree<span style="color:#999999">.</span>HTML<span style="color:#999999">(</span>html<span style="color:#999999">)</span>
        <span style="color:#669900">'''
        page-1://*[@id="all-list"]/div[1]/div[2]/ul[@class="video-list"]/li
        other://*[@id="all-list"]/div[1]/ul[@class="video-list"]/li
        '''</span>
        pattern <span style="color:#669900">=</span> <span style="color:#669900">'//*[@id="all-list"]/div[1]/div[2]/ul[contains(@class,"video-list")]/li'</span> <span style="color:#c678dd">if</span> page <span style="color:#669900">==</span> <span style="color:#98c379">1</span> <span style="color:#c678dd">else</span> <span style="color:#669900">'//*[@id="all-list"]/div[1]/ul[contains(@class,"video-list")]/li'</span>
        vurlList <span style="color:#669900">=</span> data<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span>pattern<span style="color:#999999">)</span>
        <span style="color:#c678dd">for</span> li <span style="color:#c678dd">in</span> vurlList<span style="color:#999999">:</span>
            vurl <span style="color:#669900">=</span> li<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span><span style="color:#669900">".//a/attribute::href"</span><span style="color:#999999">)</span><span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
            title <span style="color:#669900">=</span> li<span style="color:#999999">.</span>xpath<span style="color:#999999">(</span><span style="color:#669900">".//a/attribute::title"</span><span style="color:#999999">)</span><span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
            <span style="color:#c678dd">yield</span> vurl<span style="color:#999999">,</span>title
    <span style="color:#c678dd">except</span><span style="color:#999999">:</span>
        <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">'第%d页爬取失败'</span> <span style="color:#669900">%</span> page<span style="color:#999999">)</span>
        <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">'Unfortunitely -- An Unknow Error Happened, Please wait 3 seconds'</span><span style="color:#999999">)</span>
        time<span style="color:#999999">.</span>sleep<span style="color:#999999">(</span><span style="color:#98c379">3</span><span style="color:#999999">)</span>

<span style="color:#669900">'''
该函数用于正则提取,将url内的BV号提取出来
'''</span>
<span style="color:#c678dd">def</span> <span style="color:#61aeee">getBv</span><span style="color:#999999">(</span>href<span style="color:#999999">)</span><span style="color:#999999">:</span>
    pattern <span style="color:#669900">=</span> re<span style="color:#999999">.</span><span style="color:#669900">compile</span><span style="color:#999999">(</span><span style="color:#669900">'(BV.*?)\?'</span><span style="color:#999999">)</span>
    data <span style="color:#669900">=</span> re<span style="color:#999999">.</span>search<span style="color:#999999">(</span>pattern<span style="color:#999999">,</span>href<span style="color:#999999">)</span>
    <span style="color:#c678dd">if</span> data <span style="color:#669900">==</span> <span style="color:#56b6c2">None</span><span style="color:#999999">:</span>
        <span style="color:#c678dd">return</span> <span style="color:#669900">''</span>
    <span style="color:#c678dd">return</span> data<span style="color:#999999">.</span>group<span style="color:#999999">(</span><span style="color:#98c379">1</span><span style="color:#999999">)</span>

<span style="color:#c678dd">if</span> __name__ <span style="color:#669900">==</span> <span style="color:#669900">"__main__"</span><span style="color:#999999">:</span>

    <span style="color:#5c6370">#头部伪装</span>
    headers <span style="color:#669900">=</span> <span style="color:#999999">{</span>
    <span style="color:#669900">'User-Agent'</span><span style="color:#999999">:</span><span style="color:#669900">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'</span>
    <span style="color:#999999">}</span>
    hrefList <span style="color:#669900">=</span> <span style="color:#999999">[</span><span style="color:#999999">]</span>
    titleList <span style="color:#669900">=</span> <span style="color:#999999">[</span><span style="color:#999999">]</span>
    <span style="color:#5c6370">#需要爬取多少页,自行进行修改,本代码测试1~2页</span>
    <span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#98c379">1</span><span style="color:#999999">,</span><span style="color:#98c379">3</span><span style="color:#999999">)</span><span style="color:#999999">:</span> 
        url <span style="color:#669900">=</span> <span style="color:#669900">"https://search.bilibili.com/all?keyword=歪嘴战神&page={0}"</span><span style="color:#999999">.</span><span style="color:#669900">format</span><span style="color:#999999">(</span>i<span style="color:#999999">)</span>  <span style="color:#5c6370">#修改keyword后的关键字即可</span>
        l <span style="color:#669900">=</span> getHref<span style="color:#999999">(</span>url<span style="color:#999999">,</span>i<span style="color:#999999">)</span>
        <span style="color:#c678dd">for</span> vurl<span style="color:#999999">,</span>title <span style="color:#c678dd">in</span> l<span style="color:#999999">:</span>
            hrefList<span style="color:#999999">.</span>append<span style="color:#999999">(</span>vurl<span style="color:#999999">)</span>
            titleList<span style="color:#999999">.</span>append<span style="color:#999999">(</span>title<span style="color:#999999">)</span>
        <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"第{0}页爬取结束"</span><span style="color:#999999">.</span><span style="color:#669900">format</span><span style="color:#999999">(</span>i<span style="color:#999999">)</span><span style="color:#999999">)</span>
        time<span style="color:#999999">.</span>sleep<span style="color:#999999">(</span><span style="color:#98c379">2</span><span style="color:#999999">)</span>
    
    <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"---------------------------开始截取BV号-----------------------------"</span><span style="color:#999999">)</span>
    <span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#669900">len</span><span style="color:#999999">(</span>hrefList<span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">:</span>
        hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span> <span style="color:#669900">=</span> getBv<span style="color:#999999">(</span>hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#999999">)</span>
    <span style="color:#c678dd">with</span> <span style="color:#669900">open</span><span style="color:#999999">(</span><span style="color:#669900">"bv.txt"</span><span style="color:#999999">,</span><span style="color:#669900">'w'</span><span style="color:#999999">,</span>encoding<span style="color:#669900">=</span><span style="color:#669900">'utf-8'</span><span style="color:#999999">)</span> <span style="color:#c678dd">as</span> f<span style="color:#999999">:</span>
        <span style="color:#c678dd">for</span> i <span style="color:#c678dd">in</span> <span style="color:#669900">range</span><span style="color:#999999">(</span><span style="color:#669900">len</span><span style="color:#999999">(</span>hrefList<span style="color:#999999">)</span><span style="color:#999999">)</span><span style="color:#999999">:</span>  
            f<span style="color:#999999">.</span>write<span style="color:#999999">(</span>hrefList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">"\t"</span><span style="color:#669900">+</span>titleList<span style="color:#999999">[</span>i<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">"\n"</span><span style="color:#999999">)</span>
    <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"爬取结束"</span><span style="color:#999999">)</span>
</code></span>

3.爬取结果
在这里插入图片描述

 

唧唧下载视频

给出唧唧的链接,唧唧,很好用的小工具。
我们只需要将刚才爬取好的链接放在一边,不断复制BV号,然后唧唧进行下载即可。
在这里插入图片描述

 

视频重命名

唧唧下载好的视频如下所示:
在这里插入图片描述

为什么需要将它们进行改名呢,因为如果当视频数量越来越多时,比如几千几万时,通过名字便会越来越难以管理,同时也难以进行去重,很大概率会不断下载重复的视频。
在B站,BV号便是每个视频的“身份证”(主键),因此用其进行视频命名可以方便日后管理,同时也方便进行去重。

代码

<span style="color:#000000"><code class="language-python"><span style="color:#669900">'''
author:Ericam
description: 用于将下载下来的b站视频重命名,命名格式为bv号
'''</span>
<span style="color:#c678dd">import</span> os
<span style="color:#c678dd">import</span> difflib

<span style="color:#c678dd">if</span> __name__ <span style="color:#669900">==</span> <span style="color:#669900">'__main__'</span><span style="color:#999999">:</span>
    
    bvpath <span style="color:#669900">=</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>join<span style="color:#999999">(</span><span style="color:#669900">"D:/"</span><span style="color:#999999">,</span><span style="color:#669900">"Coding"</span><span style="color:#999999">,</span><span style="color:#669900">"python"</span><span style="color:#999999">,</span><span style="color:#669900">"Python爬虫"</span><span style="color:#999999">)</span>
    os<span style="color:#999999">.</span>chdir<span style="color:#999999">(</span>bvpath<span style="color:#999999">)</span>
    d <span style="color:#669900">=</span> <span style="color:#999999">{</span><span style="color:#999999">}</span>
    <span style="color:#669900">'''
    bvdownload.txt里存放bv号与title名
    若之前爬虫爬取了几千个,而唧唧只下载了几百个,便可以将这些已下载的bv和title复制到
    bvdownload.txt中,将已下载的视频进行改名
    '''</span>
    <span style="color:#c678dd">with</span> <span style="color:#669900">open</span><span style="color:#999999">(</span><span style="color:#669900">"bvdownload.txt"</span><span style="color:#999999">,</span><span style="color:#669900">'r'</span><span style="color:#999999">,</span>encoding<span style="color:#669900">=</span><span style="color:#669900">'utf-8'</span><span style="color:#999999">)</span><span style="color:#c678dd">as</span> f<span style="color:#999999">:</span>
        lines <span style="color:#669900">=</span> f<span style="color:#999999">.</span>readlines<span style="color:#999999">(</span><span style="color:#999999">)</span>
        <span style="color:#c678dd">for</span> val <span style="color:#c678dd">in</span> lines<span style="color:#999999">:</span>
            val <span style="color:#669900">=</span> val<span style="color:#999999">.</span>strip<span style="color:#999999">(</span><span style="color:#669900">"\n"</span><span style="color:#999999">)</span>
            data <span style="color:#669900">=</span> val<span style="color:#999999">.</span>split<span style="color:#999999">(</span><span style="color:#669900">"\t"</span><span style="color:#999999">)</span>
            bv <span style="color:#669900">=</span> data<span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
            title <span style="color:#669900">=</span> data<span style="color:#999999">[</span><span style="color:#98c379">1</span><span style="color:#999999">]</span>
            d<span style="color:#999999">[</span>title<span style="color:#999999">]</span> <span style="color:#669900">=</span> bv
    
    <span style="color:#5c6370">#视频存放位置</span>
    path <span style="color:#669900">=</span> <span style="color:#669900">'F:/bilibili视频/'</span>
    os<span style="color:#999999">.</span>chdir<span style="color:#999999">(</span>path<span style="color:#999999">)</span>
    videoList <span style="color:#669900">=</span> os<span style="color:#999999">.</span>listdir<span style="color:#999999">(</span><span style="color:#999999">)</span>
    
    <span style="color:#5c6370">#开始进行模糊匹配</span>
    <span style="color:#c678dd">for</span> key <span style="color:#c678dd">in</span> d<span style="color:#999999">:</span>
        video <span style="color:#669900">=</span> difflib<span style="color:#999999">.</span>get_close_matches<span style="color:#999999">(</span>key<span style="color:#999999">,</span>videoList<span style="color:#999999">,</span><span style="color:#98c379">1</span><span style="color:#999999">,</span> cutoff<span style="color:#669900">=</span><span style="color:#98c379">0.3</span><span style="color:#999999">)</span>
        <span style="color:#c678dd">if</span> <span style="color:#669900">len</span><span style="color:#999999">(</span>video<span style="color:#999999">)</span> <span style="color:#669900">==</span> <span style="color:#98c379">0</span><span style="color:#999999">:</span>
            <span style="color:#c678dd">continue</span>
        video <span style="color:#669900">=</span> video<span style="color:#999999">[</span><span style="color:#98c379">0</span><span style="color:#999999">]</span>
        <span style="color:#5c6370">#检查视频是否已存在,若存在则删除视频</span>
        <span style="color:#c678dd">if</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>d<span style="color:#999999">[</span>key<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">".mp4"</span><span style="color:#999999">)</span> <span style="color:#669900">and</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>video<span style="color:#999999">)</span><span style="color:#999999">:</span>
            os<span style="color:#999999">.</span>remove<span style="color:#999999">(</span>video<span style="color:#999999">)</span>
        <span style="color:#c678dd">else</span><span style="color:#999999">:</span>
            <span style="color:#c678dd">if</span> os<span style="color:#999999">.</span>path<span style="color:#999999">.</span>isfile<span style="color:#999999">(</span>video<span style="color:#999999">)</span><span style="color:#999999">:</span>
                os<span style="color:#999999">.</span>rename<span style="color:#999999">(</span>video<span style="color:#999999">,</span>d<span style="color:#999999">[</span>key<span style="color:#999999">]</span><span style="color:#669900">+</span><span style="color:#669900">".mp4"</span><span style="color:#999999">)</span>
    <span style="color:#c678dd">print</span><span style="color:#999999">(</span><span style="color:#669900">"重命名完成!"</span><span style="color:#999999">)</span>

</code></span>

结果演示
 
重命名完成的视频列表如下:
在这里插入图片描述

尾言

希望大家可以举一反三,提高自己的生产劳动力。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值