Whisper
Whisper 是一种通用语音识别模型。它是在大量不同音频数据集上进行训练的,也是一个多任务模型,可以执行多语言语音识别、语音翻译和语言识别。
测试,用Chattts生成一段语音:四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。
$ pip install -U openai-whisper
$ sudo apt update && sudo apt install ffmpeg
$ pip install setuptools-rust
$ whisper ../audio.wav --model tiny
100%|█████████████████████████████████████| 72.1M/72.1M [00:36<00:00, 2.08MiB/s]
/home/jetson/.local/lib/python3.8/site-packages/whisper/__init__.py:146: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(fp, map_location=device)
/home/jetson/.local/lib/python3.8/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: Chinese
[00:00.000 --> 00:03.680] 四川美時確實以辣文明 但以有不辣的選擇
[00:03.680 --> 00:07.200] 比如潛水面 賴湯圓 再轟高夜熱八等
[00:07.200 --> 00:11.560] 這些小市口維溫和 然後甜而不膩也很受歡迎
这个是CPU运行的😂,GPU都没带喘的。
Faster Whisper
fast-whisper 是使用 CTranslate2 重新实现 OpenAI 的 Whisper 模型,CTranslate2 是 Transformer 模型的快速推理引擎。
Funasr有个大问题,它的实时转录是CPU的,很慢,GPU的支持离线语音转文字,但又不能实时。找到了一个faster-whisper可以支持实时GPU转录,也支持中文。
安装使用
pip install faster-whisper
from faster_whisper import WhisperModel
model_size = "large-v3"
# Run on GPU with FP16
# model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8
model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu