ubuntu下faster-whisper安装、基于faster-whisper的语音识别示例、同步生成srt字幕文件


前言

上一篇某站视频、音频集合批量下载写了如何下载某站的音频和视频文件,这一篇主要讲解记录一下基于faster-whisper的语音识别怎么做,不包含理论部分,主要包括以下三部分
1)faster-whisper的安装
2)基于faster-whisper的语音识别
3)转srt字幕文件


一、faster-whisper的安装

1.docker及nvidia-docker安装

ubuntu20.04下nvidia驱动安装,docker/nvidia-docker安装

2.镜像下载

docker pull nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

3.启动容器

docker run -itd --name=faster-whispter-demo --net=host --gpus all --shm-size=16g nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 bash
docker exec -it faster-whispter-demo bash

3.容器中创建用户,安装anaconda

主要是容器中为root用户,有时候文件夹映射后在主机访问文件总要修改权限比较麻烦,可以不添加用户,但是基础软件可以按需安装

# 在容器中
#添加dl用户
useradd -ms /bin/bash dl
# 设置dl的密码
passwd dl
New password: 
Retype new password: 
passwd: password updated successfully
# 修改/home/dl文件夹权限
chmod -R o+wrx /home/dl

# 安装一些基础软件
apt-get update
apt-get install vim
apt-get install sudo

# 给dl赋予sudo权限
chmod +wrx /etc/sudoers
vi /etc/sudoers
# 在root下面加dl 所有权,抄root的

#切换到dl用户
su dl
sudo apt-get install wget
sudo apt-get install git

# 下载anaconda,也可以提前下载好拷贝到容器里
cd /home/dl/
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
chmod +x ./Anaconda3-2023.09-0-Linux-x86_64.sh
./Anaconda3-2023.09-0-Linux-x86_64.sh
### 下面是anaconda安装中一些输入ENTER->yes->ENTER->yes
Enter
/yes
Do you accept the license terms? [yes|no]
[no] >>> yes
Anaconda3 will now be installed into this location:
/home/dl/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/dl/anaconda3] >>> 

installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
   run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
### anaconda安装成功

#使anaconda生效
source ~/.bashrc
#测试jupyter notebook
jupyter notebook
#复制连接到浏览器看看

二、基于faster-whisper的语音识别

1.将cuda 和 nvidia加入到dl的环境变量中

# 此时还在容器中,切换到root用户
exit
# 修改权限
chmod -R o+wrx /usr/local/
echo $PATH
#/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo $LD_LIBRARY_PATH
#/usr/local/nvidia/lib:/usr/local/nvidia/lib64
#切换到dl用户,将上面root用户里的PATH和LD_LIBRARY_PATH加入到dl的环境变量中
su dl
vi ~/.bashrc
export PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
3使生效
source ~/.bashrc
# 安装nvidia-cublas-cu11 nvidia-cudnn-cu11
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple nvidia-cublas-cu11 nvidia-cudnn-cu11

2.安装faster-whisper

# https://github.com/SYSTRAN/faster-whisper,由于不能科学上网,从gitee上找的镜像
git clone https://gitee.com/loocen/faster-whisper
cd faster-whisper/
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python setup.py install

3.模型下载

由于原官方模型需要科学上网才能下载,这里找的是镜像,需要手动下载

# 以large-v3模型举例模型下载,在faster-whisper文件夹下创建一个存放模型的目录
mkdir -p  /homw/dl/faster-whisper/model/faster-whisper-large-v3
cd /homw/dl/faster-whisper/model/faster-whisper-large-v3
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/model.bin?download=true -O model.bin
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/README.md?download=true -O README.md
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/config.json?download=true -O config.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/preprocessor_config.json?download=true -O preprocessor_config.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/tokenizer.json?download=true -O tokenizer.json
wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/vocabulary.json?download=true -O vocabulary.json

###各模型地址
large-v3模型:https://hf-mirror.com/Systran/faster-whisper-large-v3/tree/main
large-v2模型:https://hf-mirror.com/guillaumekln/faster-whisper-large-v2/tree/main
large-v2模型:https://hf-mirror.com/guillaumekln/faster-whisper-large-v1/tree/main
medium模型:https://hf-mirror.com/guillaumekln/faster-whisper-medium/tree/main
small模型:https://hf-mirror.com/guillaumekln/faster-whisper-small/tree/main
base模型:https://hf-mirror.com/guillaumekln/faster-whisper-base/tree/main
tiny模型:https://hf-mirror.com/guillaumekln/faster-whisper-tiny/tree/main

4.启动jupyter notebook 测试是否安装成功

cd /home/dl
jupyter notebook
# 复制链接到浏览器中
# new一个notebook,选python3

在notebook中复制下面的代码测试

from faster_whisper import WhisperModel, decode_audio

# model_size = "large-v3"
model_path='/home/dl/faster-whisper/model/faster-whisper-large-v3'

# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float32")

# test1
audio_path='/home/dl/faster-whisper/tests/data/stereo_diarization.wav'

left, right = decode_audio(audio_path, split_stereo=True)

segments, _ = model.transcribe(left)
transcription = "".join(segment.text for segment in segments).strip()
assert transcription == (
    "He began a confused complaint against the wizard, "
    "who had vanished behind the curtain on the left."
)
print(transcription)
segments, _ = model.transcribe(right)
transcription = "".join(segment.text for segment in segments).strip()
assert transcription == "The horizon seems extremely distant."

# test2
audio='/home/dl/faster-whisper/tests/data/jfk.flac'
segments, info = model.transcribe(audio, beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
     print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

三、转srt字幕文件

# 新开一个命令行,进入容器
docker exec -it faster-whispter-demo bash
su dl
pip  install  -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx 
pip  install  -i https://pypi.tuna.tsinghua.edu.cn/simple pysubs2
mkdir /home/dl/faster-whisper/mp3

新开一个命令行执行类似如下命令,containerid替换成faster-whispter-demo对应的容器id,将宿主机中要语音识别的mp3文件复制到容器中

docker cp mp3 containerid:/home/dl/faster-whisper/mp3

将下面的代码拷贝到二-》4中的notebook中,执行一下,会生成srt,docx和txt文件。有一个问题还没有解决,标点符号修改和分段。这个后期再研究

import math
from docx import Document
import pysubs2
from dataclasses import dataclass
import os

@dataclass
class DownloadInfo:
    base_url: str
    max_episod: int
    save_dir: str
#zhuan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ct4y1871B',99,'专/2022-专')
#shiwu_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ET411375o',93,'专/2022-实')
#xiangguan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1pZ4y1e77d',84,'专/2022-相')
#fachongci_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1E8411W7JD',50,'专/2022-法冲刺')
shiwu_2023 = DownloadInfo('https://www.xxxxxxx.com/video/BV1Kc411L7dQ',1,'mp3')
current_down = shiwu_2023
base_dir = '/home/dl/faster-whisper/'
for audio_i in range(1,current_down.max_episod+1):
    audio= os.path.join(base_dir,current_down.save_dir,"{}.mp3".format(audio_i))
    print(audio)
    segments, info = model.transcribe(audio, beam_size=5)
    srt_path = os.path.join(base_dir,current_down.save_dir,"{}.srt".format(audio_i))
    doc_path = os.path.join(base_dir,current_down.save_dir,"{}.docx".format(audio_i))
    txt_path = os.path.join(base_dir,current_down.save_dir,"{}.txt".format(audio_i))
    print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
    with open(srt_path,'w') as f:
        for segment in segments:
            sentence_timestamp = []
            for p_time in [segment.start, segment.end]:
                m,s = divmod(p_time, 60)
                h,m = divmod(m, 60)
                ms, s = math.modf(s)
                sentence_timestamp.append((int(h), int(m), int(s), int(ms * 1000)))
            line = '{}\n{:0>2d}:{:0>2d}:{:0>2d},{:0>3d} --> {:0>2d}:{:0>2d}:{:0>2d},{:0>3d}\n{}\n\n'.format(
                    segment.id,sentence_timestamp[0][0],sentence_timestamp[0][1],sentence_timestamp[0][2],sentence_timestamp[0][3],sentence_timestamp[1][0],sentence_timestamp[1][1],sentence_timestamp[1][2],sentence_timestamp[1][3],segment.text)
#             print(line)
            f.write(line)
            # print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

    doc_zhuan = Document()    
    subtitles = pysubs2.load(srt_path)

    save_text = ''
    for sub in subtitles:
        sub_text = sub.text
        if sub_text.find(',') == -1:
            sub_text = sub.text+','
        save_text = save_text + sub_text
#     print(save_text)

    doc_zhuan.add_paragraph(save_text)
    doc_zhuan.save(doc_path)
    
    with open(txt_path,'w') as f_txt:
        f_txt.write(save_text)
### 构建任务失败解决方案 当遇到 `Execution failed for task ':app:shrinkReleaseRes'` 错误时,这通常意味着资源压缩过程中出现了问题。此错误可能由多种原因引起,包括但不限于配置不正确、依赖冲突或特定于项目的其他因素。 #### 可能的原因分析 1. **ProGuard 或 R8 配置不当** ProGuard 和 R8 是用于优化和混淆代码以及减少 APK 大小的工具。如果这些工具的配置存在问题,可能会导致资源无法正常处理[^1]。 2. **重复资源** 如果项目中有多个模块定义了相同的资源名称,可能导致冲突并引发该错误。检查是否存在重名的 drawable、string 等资源文件[^2]。 3. **第三方库兼容性** 某些第三方库可能与当前使用的 Gradle 插件版本或其他库存在兼容性问题,从而影响到资源打包过程中的行为[^3]。 4. **Gradle 缓存问题** 有时旧缓存数据会干扰新编译的结果,尝试清理本地仓库和重新同步项目可以帮助排除此类潜在障碍[^4]。 #### 推荐的操作方法 为了有效解决问题,建议按照以下步骤逐一排查: ```bash # 清理项目构建目录 ./gradlew clean # 删除 .gradle 文件夹下的所有内容以清除缓存 rm -rf ~/.gradle/caches/ ``` 调整 `build.gradle` 中的相关设置也是一个重要环节: ```groovy android { ... buildTypes { release { minifyEnabled true // 是否启用代码缩减 shrinkResources true // 是否开启资源压缩 proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro' // 尝试禁用 shrinkResources 来测试是否为资源压缩引起的错误 // shrinkResources false } } } ``` 此外,在 `proguard-rules.pro` 文件内添加必要的保留规则,防止关键类被意外移除: ```text -keep class com.example.yourpackage.** { *; } # 替换为你自己的包路径 -dontwarn androidx.**,com.google.** # 忽略警告信息 ``` 最后,确保所使用的 Android Studio 版本是最新的稳定版,并且已经应用了所有的补丁更新。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值