Jetson 部署 Faster Whisper

Whisper

Whisper 是一种通用语音识别模型。它是在大量不同音频数据集上进行训练的,也是一个多任务模型,可以执行多语言语音识别、语音翻译和语言识别。

在这里插入图片描述
测试,用Chattts生成一段语音:四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。

$ pip install -U openai-whisper
$ sudo apt update && sudo apt install ffmpeg
$ pip install setuptools-rust

$ whisper ../audio.wav --model tiny
100%|█████████████████████████████████████| 72.1M/72.1M [00:36<00:00, 2.08MiB/s]
/home/jetson/.local/lib/python3.8/site-packages/whisper/__init__.py:146: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
/home/jetson/.local/lib/python3.8/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: Chinese
[00:00.000 --> 00:03.680] 四川美時確實以辣文明 但以有不辣的選擇
[00:03.680 --> 00:07.200] 比如潛水面 賴湯圓 再轟高夜熱八等
[00:07.200 --> 00:11.560] 這些小市口維溫和 然後甜而不膩也很受歡迎

这个是CPU运行的😂,GPU都没带喘的。

Faster Whisper

fast-whisper 是使用 CTranslate2 重新实现 OpenAI 的 Whisper 模型,CTranslate2 是 Transformer 模型的快速推理引擎。

Funasr有个大问题,它的实时转录是CPU的,很慢,GPU的支持离线语音转文字,但又不能实时。找到了一个faster-whisper可以支持实时GPU转录,也支持中文。

安装使用

pip install faster-whisper
from faster_whisper import WhisperModel

model_size = "large-v3"

# Run on GPU with FP16
# model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

尝试WSL部署

  • Cuda:12.6
  • Cudnn:9.2

直接运行,报错Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory,这是需要cublas,cudnn的python库:

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

但是仍然跑不起来,因为:

Version 9+ of nvidia-cudnn-cu12 appears to cause issues due its reliance on cuDNN 9 (Faster-Whisper does not currently support cuDNN 9). Ensure your version of the Python package is for cuDNN 8.

那我安装 Cudnn 8 不就行了?果断下载cudnn8 for cuda 12.x,但是每次都安装cudnn9.4,除了降cuda版本,否则没办法恢复到cudnn8。

尝试 Jetson 部署

  • Cuda:11.4
  • Cudnn:8.6.0

简直量身定制啊!首先尝试安装cudnn python库:

$ pip3 install faster-whisper -i https://mirrors.aliyun.com/pypi/simple/

# 贴心的提示我们:For all these methods below, keep in mind the above note
# regarding CUDA versions. Depending on your setup, you may need to install the
# CUDA 11 versions of libraries that correspond to the CUDA 12 libraries listed
# in the instructions below.

$ pip install --extra-index-url https://pypi.nvidia.com nvidia-cudnn-cu11
...
The installation of nvidia-cudnn-cu11 for version 9.0.0.312 failed.

      This is a special placeholder package which downloads a real wheel package
      from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
      cannot download the real wheel file to install.

      You might try installing this package via
      $ pip install --extra-index-url https://pypi.nvidia.com nvidia-cudnn-cu11

      Here is some debug information about your platform to include in any bug
      report:

      Python Version: CPython 3.8.10
      Operating System: Linux 5.10.104-tegra
      CPU Architecture: aarch64
      nvidia-smi command not found. Ensure NVIDIA drivers are installed.

原来是 nvidia-cudnnn-cu11没有aarch64 Arm版本!但是nvidia-cudnn-cu12有。

怎么办,安装cuda 12.2?Jetson的系统是离线刷机jetpack 6确实支持12.2和cudnn8:
在这里插入图片描述

已经准备买新的固态刷机了,但是太麻烦了,得装虚拟机装刷机SDK,得拆机箱改跳帽,得重新配置ssh网络连接,关键是,得花钱!

在这里插入图片描述

不试试怎么行呢,我不信邪,就安装cudnn12 python库:

pip install --extra-index-url https://pypi.nvidia.com nvidia-cudnn-cu12
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting nvidia-cudnn-cu12
  Downloading nvidia_cudnn_cu12-9.4.0.58-py3-none-manylinux2014_aarch64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12 (from nvidia-cudnn-cu12)
  Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.6.1.4-py3-none-manylinux2014_aarch64.whl (376.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 376.7/376.7 MB 12.9 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-9.4.0.58-py3-none-manylinux2014_aarch64.whl (572.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 572.7/572.7 MB 1.1 MB/s eta 0:00:00
Installing collected packages: nvidia-cublas-cu12, nvidia-cudnn-cu12
Successfully installed nvidia-cublas-cu12-12.6.1.4 nvidia-cudnn-cu12-9.4.0.58

跑一下demo:

test.py
preprocessor_config.json: 100%|████████████████████████████████| 340/340 [00:00<00:00, 118kB/s]
config.json: 100%|█████████████████████████████████████████████| 2.39k/2.39k [00:00<00:00, 1.03MB/s]
vocabulary.json: 100%|█████████████████████████████████████████| 1.07M/1.07M [00:00<00:00, 1.13MB/s]
tokenizer.json: 100%|██████████████████████████████████████████| 2.48M/2.48M [00:01<00:00, 2.14MB/s]
model.bin: 100%|███████████████████████████████████████████████| 3.09G/3.09G [03:18<00:00, 9.89MB/s]
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
  File "/home/jetson/.local/lib/python3.8/site-packages/faster_whisper/transcribe.py", line 145, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: This CTranslate2 package was not compiled with CUDA support

Holy🤬,这又是咋回事,找一下:This CTranslate2 package was not compiled with CUDA support #1306,跳过他们的讨论,结合faster-whisper库里的描述:

Note: Latest versions of ctranslate2 support CUDA 12 only. For CUDA 11, the current workaround is downgrading to the 3.24.0 version of ctranslate2 (This can be done with pip install --force-reinstall ctranslate2==3.24.0 or specifying the version in a requirements.txt).

又是cuda11的幺蛾子,它说要使用降级的方法:

$ pip install --force-reinstall ctranslate2==3.24.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mediapipe 0.8.4 requires opencv-contrib-python, which is not installed.
onnx-graphsurgeon 0.3.12 requires onnx, which is not installed.
d2l 0.17.6 requires numpy==1.21.5, but you have numpy 1.24.4 which is incompatible.
d2l 0.17.6 requires requests==2.25.1, but you have requests 2.32.3 which is incompatible.
faster-whisper 1.0.3 requires ctranslate2<5,>=4.0, but you have ctranslate2 3.24.0 which is incompatible.

呸!😶

我试试自己编一个cuda版本的:https://opennmt.net/CTranslate2/installation.html#compile-the-c-library

$ pip3 uninstall ctranslate2 whisper-ctranslate2
$ git clone --recursive https://github.com/OpenNMT/CTranslate2.git
$ mkdir build && cd build
$ cmake ..
...
CMake Error at CMakeLists.txt:294 (message):
  Intel OpenMP runtime libiomp5 not found

-- Configuring incomplete, errors occurred!

哪来的intel?找找,原来是,By default, the library is compiled with the Intel MKL backend which should be installed separately. See the Build options to select or add another backend. 改一下,不用老in家的:

# 老张我给你表演什么叫一镜到底,注意看,我只表演一次:
$ cmake .. -DOPENMP_RUNTIME=COMP -DWITH_MKL=OFF -DWITH_CUDA=ON -DWITH_CUDNN=ON
$ make -j32
$ sudo make install
$ sudo ldconfig
$ cd ../python
$ pip install -r install_requirements.txt
$ python setup.py bdist_wheel
$ pip install dist/*.whl

在这里插入图片描述

喜大普奔!

时间戳

from faster_whisper import WhisperModel

model_size = "large-v3"

# Run on GPU with FP16
# model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")



segments, _ = model.transcribe("audio.wav", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
[0.00s -> 0.24s] 四
[0.24s -> 0.44s] 川
[0.44s -> 0.58s] 美
[0.58s -> 0.78s] 食
[0.78s -> 1.10s] 确
..
[9.72s -> 9.96s] 腻
[9.96s -> 10.42s] 也
[10.42s -> 10.68s] 很
[10.68s -> 10.82s] 受
[10.82s -> 11.04s] 欢
[11.04s -> 11.22s] 迎

实时转录

Whisper 实时流式传输,用于长时间语音到文本的转录和翻译。Whisper 是最近最先进的多语言语音识别和翻译模型之一,然而,它并不是为实时转录而设计的。在本文中,我们在 Whisper 之上构建并创建了 Whisper-Streaming,这是一种实时语音转录和类似 Whisper 模型翻译的实现。 Whisper-Streaming 使用本地协议策略和自适应延迟来实现流式转录。我们证明 Whisper-Streaming 在未分段的长格式语音转录测试集上实现了高质量和 3.3 秒的延迟,并且我们在多语言会议上展示了其作为实时转录服务组件的鲁棒性和实际可用性。

$ git clone git@github.com:ufal/whisper_streaming.git
$ cd whisper_streaming
$ python3 whisper_online.py ../audio.wav --language zh --min-chunk-size 1
INFO    Audio duration is: 11.68 seconds
INFO    Loading Whisper large-v2 model for zh...
INFO    done. It took 14.19 seconds.
DEBUG   PROMPT:
DEBUG   CONTEXT:
DEBUG   transcribing 1.00 seconds from 0.00
DEBUG   >>>>COMPLETE NOW: (None, None, '')
DEBUG   INCOMPLETE: (0.0, 0.98, '四川美食群')
DEBUG   len of buffer now: 1.00
DEBUG   ## last processed 1.00 s, now is 5.30, the latency is 4.29
DEBUG   PROMPT:
DEBUG   CONTEXT:
DEBUG   transcribing 5.30 seconds from 0.00
DEBUG   >>>>COMPLETE NOW: (0.0, 0.88, '四川美食')
DEBUG   INCOMPLETE: (0.88, 5.26, '确实以辣为名,但也有不辣的选择,比如甜水面赖淘宝。')
DEBUG   len of buffer now: 5.30
11643.5227 0 880 四川美食
11643.5227 0 880 四川美食
DEBUG   ## last processed 5.30 s, now is 11.64, the latency is 6.35
DEBUG   PROMPT:
DEBUG   CONTEXT: 四川美食
DEBUG   transcribing 11.64 seconds from 0.00
DEBUG   >>>>COMPLETE NOW: (None, None, '')
DEBUG   INCOMPLETE: (0.88, 11.24, '確實以辣聞名,但也有不辣的選擇,比如甜水麵、瀨湯圓、炸烘糕 、葉子粑等,這些小吃口味溫和,然後甜而不膩,也很受歡迎。')
DEBUG   len of buffer now: 11.64
DEBUG   ## last processed 11.64 s, now is 21.61, the latency is 9.96
DEBUG   PROMPT:
DEBUG   CONTEXT: 四川美食
DEBUG   transcribing 11.68 seconds from 0.00
DEBUG   >>>>COMPLETE NOW: (None, None, '')
DEBUG   INCOMPLETE: (0.88, 11.32, '确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕 叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。')
DEBUG   len of buffer now: 11.68
DEBUG   ## last processed 21.61 s, now is 31.53, the latency is 9.92
DEBUG   last, noncommited: (0.88, 11.32, '确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。')
31528.1091 880 11320 确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。
31528.1091 880 11320 确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。

注:更改模型量化:

# this worked fast and reliably on NVIDIA L40
# model = WhisperModel(model_size_or_path, device="cuda", compute_type="float16", download_root=cache_dir)

# or run on GPU with INT8
# tested: the transcripts were different, probably worse than with FP16, and it was slightly (appx 20%) slower
model = WhisperModel(model_size_or_path, device="cuda", compute_type="int8_float16")
  • 8
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Jetson是一款由NVIDIA设计的高性能嵌入式计算平台,而Yolov8则是一种用于目标检测的深度学习模型。在Jetson部署Yolov8,可以让我们在嵌入式设备上实时进行高效准确的目标检测。 要在Jetson部署Yolov8,我们可以按照以下步骤进行操作: 1. 首先,确保Jetson上已经安装并配置好了NVIDIA JetPack SDK,这是Jetson平台开发的集成软件套件。 2. 接着,我们需要从Darknet的官方网站上下载Yolov8的预训练权重文件以及配置文件。这些文件描述了模型的架构和已经训练好的权重参数。 3. 在Jetson上安装所需的软件库和依赖项,如CUDA、cuDNN和OpenCV等。这些库可以通过JetPack SDK的包管理器进行安装。 4. 然后,将预训练的权重文件和配置文件复制到Jetson上。 5. 接下来,我们需要使用OpenCV库在Jetson上捕获和预处理视频或图像。这可以通过Jetson上的摄像头或外部设备完成。 6. 对于推理过程,我们使用Jetson上的TensorRT库。TensorRT可以对深度学习模型进行优化和加速,以提高推理性能。 7. 最后,使用Yolov8进行目标检测。将预处理的图像或视频输入到Yolov8模型中,并获得检测到的目标的位置和类别。 这样,我们就成功在Jetson部署了Yolov8,并可以在实时场景中使用它来进行目标检测。需要注意的是,由于Jetson是嵌入式设备,计算资源有限,因此可能需要对模型进行进一步的优化和调整,以在性能和准确率之间达到平衡。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

言京谅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值