ubuntu下faster-whisper安装、基于faster-whisper的语音识别示例、同步生成srt字幕文件

最新推荐文章于 2025-03-25 20:07:42 发布

JJ240084173

最新推荐文章于 2025-03-25 20:07:42 发布

阅读量4.3k

点赞数 14

分类专栏：实用小程序文章标签： ubuntu whisper 语音识别

本文链接：https://blog.csdn.net/JJ240084173/article/details/135917786

版权

实用小程序专栏收录该内容

2 篇文章

订阅专栏

文章目录

前言
一、faster-whisper的安装
二、基于faster-whisper的语音识别
三、转srt字幕文件

前言

上一篇某站视频、音频集合批量下载写了如何下载某站的音频和视频文件，这一篇主要讲解记录一下基于faster-whisper的语音识别怎么做，不包含理论部分，主要包括以下三部分
1）faster-whisper的安装
2）基于faster-whisper的语音识别
3）转srt字幕文件

一、faster-whisper的安装

1.docker及nvidia-docker安装

见ubuntu20.04下nvidia驱动安装，docker/nvidia-docker安装

2.镜像下载

docker pull nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

3.启动容器

docker run -itd --name=faster-whispter-demo --net=host --gpus all --shm-size=16g nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 bash
docker exec -it faster-whispter-demo bash

3.容器中创建用户，安装anaconda

主要是容器中为root用户，有时候文件夹映射后在主机访问文件总要修改权限比较麻烦，可以不添加用户，但是基础软件可以按需安装

# 在容器中
#添加dl用户
useradd -ms /bin/bash dl
# 设置dl的密码
passwd dl
New password: 
Retype new password: 
passwd: password updated successfully
# 修改/home/dl文件夹权限
chmod -R o+wrx /home/dl

# 安装一些基础软件
apt-get update
apt-get install vim
apt-get install sudo

# 给dl赋予sudo权限
chmod +wrx /etc/sudoers
vi /etc/sudoers
# 在root下面加dl 所有权，抄root的

#切换到dl用户
su dl
sudo apt-get install wget
sudo apt-get install git

# 下载anaconda，也可以提前下载好拷贝到容器里
cd /home/dl/
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
chmod +x ./Anaconda3-2023.09-0-Linux-x86_64.sh
./Anaconda3-2023.09-0-Linux-x86_64.sh
### 下面是anaconda安装中一些输入ENTER->yes->ENTER->yes
Enter
/yes
Do you accept the license terms? [yes|no]
[no] >>> yes
Anaconda3 will now be installed into this location:
/home/dl/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/dl/anaconda3] >>> 

installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
   run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
### anaconda安装成功

#使anaconda生效
source ~/.bashrc
#测试jupyter notebook
jupyter notebook
#复制连接到浏览器看看

二、基于faster-whisper的语音识别

1.将cuda 和 nvidia加入到dl的环境变量中

# 此时还在容器中，切换到root用户
exit
# 修改权限
chmod -R o+wrx /usr/local/
echo $PATH
#/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo $LD_LIBRARY_PATH
#/usr/local/nvidia/lib:/usr/local/nvidia/lib64
#切换到dl用户,将上面root用户里的PATH和LD_LIBRARY_PATH加入到dl的环境变量中
su dl
vi ~/.bashrc
export PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
3使生效
source ~/.bashrc
# 安装nvidia-cublas-cu11 nvidia-cudnn-cu11
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple nvidia-cublas-cu11 nvidia-cudnn-cu11

2.安装faster-whisper

# https://github.com/SYSTRAN/faster-whisper，由于不能科学上网，从gitee上找的镜像
git clone https://gitee.com/loocen/faster-whisper
cd faster-whisper/
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python setup.py install

3.模型下载

由于原官方模型需要科学上网才能下载，这里找的是镜像，需要手动下载

# 以large-v3模型举例模型下载，在faster-whisper文件夹下创建一个存放模型的目录
mkdir -p  /homw/dl/faster-whisper/model/faster-whisper-large-v3
cd /homw/dl/faster-whisper/model/faster-whisper-large-v3
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/model.bin?download=true -O model.bin
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/README.md?download=true -O README.md
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/config.json?download=true -O config.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/preprocessor_config.json?download=true -O preprocessor_config.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/tokenizer.json?download=true -O tokenizer.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/vocabulary.json?download=true -O vocabulary.json

###各模型地址
large-v3模型：https://hf-mirror.com/Systran/faster-whisper-large-v3/tree/main
large-v2模型：https://hf-mirror.com/guillaumekln/faster-whisper-large-v2/tree/main
large-v2模型：https://hf-mirror.com/guillaumekln/faster-whisper-large-v1/tree/main
medium模型：https://hf-mirror.com/guillaumekln/faster-whisper-medium/tree/main
small模型：https://hf-mirror.com/guillaumekln/faster-whisper-small/tree/main
base模型：https://hf-mirror.com/guillaumekln/faster-whisper-base/tree/main
tiny模型：https://hf-mirror.com/guillaumekln/faster-whisper-tiny/tree/main

4.启动jupyter notebook 测试是否安装成功

cd /home/dl
jupyter notebook
# 复制链接到浏览器中
# new一个notebook,选python3

在notebook中复制下面的代码测试

from faster_whisper import WhisperModel, decode_audio

# model_size = "large-v3"
model_path='/home/dl/faster-whisper/model/faster-whisper-large-v3'

# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float32")

# test1
audio_path='/home/dl/faster-whisper/tests/data/stereo_diarization.wav'

left, right = decode_audio(audio_path, split_stereo=True)

segments, _ = model.transcribe(left)
transcription = "".join(segment.text for segment in segments).strip()
assert transcription == (
    "He began a confused complaint against the wizard, "
    "who had vanished behind the curtain on the left."
)
print(transcription)
segments, _ = model.transcribe(right)
transcription = "".join(segment.text for segment in segments).strip()
assert transcription == "The horizon seems extremely distant."

# test2
audio='/home/dl/faster-whisper/tests/data/jfk.flac'
segments, info = model.transcribe(audio, beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

三、转srt字幕文件

# 新开一个命令行，进入容器
docker exec -it faster-whispter-demo bash
su dl
pip  install  -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx 
pip  install  -i https://pypi.tuna.tsinghua.edu.cn/simple pysubs2
mkdir /home/dl/faster-whisper/mp3

新开一个命令行执行类似如下命令，containerid替换成faster-whispter-demo对应的容器id，将宿主机中要语音识别的mp3文件复制到容器中

docker cp mp3 containerid:/home/dl/faster-whisper/mp3

将下面的代码拷贝到二-》4中的notebook中，执行一下，会生成srt，docx和txt文件。有一个问题还没有解决，标点符号修改和分段。这个后期再研究

import math
from docx import Document
import pysubs2
from dataclasses import dataclass
import os

@dataclass
class DownloadInfo:
    base_url: str
    max_episod: int
    save_dir: str
#zhuan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ct4y1871B',99,'专/2022-专')
#shiwu_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ET411375o',93,'专/2022-实')
#xiangguan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1pZ4y1e77d',84,'专/2022-相')
#fachongci_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1E8411W7JD',50,'专/2022-法冲刺')
shiwu_2023 = DownloadInfo('https://www.xxxxxxx.com/video/BV1Kc411L7dQ',1,'mp3')
current_down = shiwu_2023
base_dir = '/home/dl/faster-whisper/'
for audio_i in range(1,current_down.max_episod+1):
    audio= os.path.join(base_dir,current_down.save_dir,"{}.mp3".format(audio_i))
    print(audio)
    segments, info = model.transcribe(audio, beam_size=5)
    srt_path = os.path.join(base_dir,current_down.save_dir,"{}.srt".format(audio_i))
    doc_path = os.path.join(base_dir,current_down.save_dir,"{}.docx".format(audio_i))
    txt_path = os.path.join(base_dir,current_down.save_dir,"{}.txt".format(audio_i))
    print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
    with open(srt_path,'w') as f:
        for segment in segments:
            sentence_timestamp = []
            for p_time in [segment.start, segment.end]:
                m,s = divmod(p_time, 60)
                h,m = divmod(m, 60)
                ms, s = math.modf(s)
                sentence_timestamp.append((int(h), int(m), int(s), int(ms * 1000)))
            line = '{}\n{:0>2d}:{:0>2d}:{:0>2d},{:0>3d} --> {:0>2d}:{:0>2d}:{:0>2d},{:0>3d}\n{}\n\n'.format(
                    segment.id,sentence_timestamp[0][0],sentence_timestamp[0][1],sentence_timestamp[0][2],sentence_timestamp[0][3],sentence_timestamp[1][0],sentence_timestamp[1][1],sentence_timestamp[1][2],sentence_timestamp[1][3],segment.text)
#             print(line)
            f.write(line)
            # print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

    doc_zhuan = Document()    
    subtitles = pysubs2.load(srt_path)

    save_text = ''
    for sub in subtitles:
        sub_text = sub.text
        if sub_text.find(',') == -1:
            sub_text = sub.text+','
        save_text = save_text + sub_text
#     print(save_text)

    doc_zhuan.add_paragraph(save_text)
    doc_zhuan.save(doc_path)
    
    with open(txt_path,'w') as f_txt:
        f_txt.write(save_text)