20200806_利用Python爬取B站视频

最新推荐文章于 2024-08-29 15:14:11 发布

ucaslilong

最新推荐文章于 2024-08-29 15:14:11 发布

阅读量1.6k

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/ucas_lilong/article/details/107852502

版权

本文介绍了如何使用Python爬取B站视频，包括爬取程序的实现，解决CentOS上ffmpeg安装慢的问题，以及通过ffmpeg合并音频和视频文件的示例。

摘要由CSDN通过智能技术生成

利用Python爬取B站视频

主要参考lancely、温欣爸比、纯洁的微笑、码农家园的四篇博文。

1.爬取的python程序

以下为程序总体，从码农家园摘录而来，其中涉及到队列的问题可以参考纯洁的微笑

import requests,threading,re,json,os,time
from lxml import etree
from queue import Queue

headers = {
   
'Connection': 'keep-alive',
'Referer': 'https://www.bilibili.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

video_queue = Queue(100)

def single_data(url):
    resp = requests.get(url,headers=headers)
    html = etree.HTML(resp.text)
    title = html.xpath('//div[@id="viewbox_report"]/h1/@title')[0]
    print('下载：',title)
    data = re.search(r'__playinfo__=(.*?)</script><script>',resp.text).group(1)
    data = json.loads(data)
    try:
        time = data['data']['dash']['duration']