whisper v3语音转文字

最新推荐文章于 2024-07-01 20:39:26 发布

格瑞Lxf

最新推荐文章于 2024-07-01 20:39:26 发布

阅读量581

点赞数 3

文章标签： whisper 语音识别人工智能

本文链接：https://blog.csdn.net/China_boy007/article/details/136541126

版权

原文github地址：openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision (github.com)

一、配置环境

建议python3.8-3.11以及基础的gpu版torch等等，可以pip直接下载库whisper

pip install -U openai-whisper

也可以从github上拉取并安装最新的提交及其 Python 依赖项：（推荐）

pip install git+https://github.com/openai/whisper.git

如果后期github上又重新更新了某些内容，可以拉取更新。

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

还要在系统上安装ffmpeg工具（加载音频要用到）

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

linux用命令就可以下载了，如果是windows的话需要进入官网

https://chocolatey.org/ 或 https://scoop.sh/

以scoop为例：

运行这两行命令就行了，然后就能用scoop 下载ffmpeg

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression

模型可用的型号如下：建议直接用 23年11月发布的large v3模型

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

快速开始：

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

其他用法可以看github原地址。

模型选择：

_MODELS = {
    "tiny.en": "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt",
    "tiny": "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt",
    "base.en": "https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt",
    "base": "https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt",
    "small.en": "https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt",
    "small": "https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt",
    "medium.en": "https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt",
    "medium": "https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt",
    "large-v1": "https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt",
    "large-v2": "https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt",
    "large-v3": "https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt",
    "large": "https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt",
}

如果要用large-v3就在load_model("large-v3")，模型会自动下载到.cache里的whisper，

我的电脑是C:\Users\lxf\.cache\whisper

格瑞Lxf

关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
1
评论
whisper v3语音转文字

如果要用large-v3就在load_model("large-v3")，模型会自动下载到.cache里的whisper，建议python3.8-3.11以及基础的gpu版torch等等，可以pip直接下载库。https://chocolatey.org/ 或 https://scoop.sh/模型可用的型号如下：建议直接用 23年11月发布的large v3模型。linux用命令就可以下载了，如果是windows的话需要进入官网。如果后期github上又重新更新了某些内容，可以拉取更新。
复制链接

扫一扫