前言
语音识别一直以来都是人工智能领域中一个不容忽视的技术,随着大模型时代的到来,这项技术也发生了质的变化。凡是在AI相关的讨论中,语音识别绝对是一个高热的话题。
目前开源的语音识别软件中,Openai Whisper绝对是霸主的存在,他在这方面的表现甚至超越了很多商用的产品,那么Openai Whisper对中文的支持如何呢,今天我们来简单测试一下。
一、资料准备
因为今天我们主要研究中文识别,所以这里我准备了一个比较有特色的音频。语音文件如下面所示:
- 一年级-小青蛙(标准普通话):1.mp3(532K) 点击下载
内容如下:
河水清清天气晴,小小青蛙大眼睛。保护禾苗吃害虫,做了不少好事情。请你爱护小青蛙,好让禾苗不生病。
- 三字经素读11(标准普通话):2.mp3(533K) 点击下载
内容如下:
读史者,考实录。通古今,若亲目。昔仲尼,师项橐。古圣贤,尚勤学。赵中令,读鲁论。彼既仕,学且勤。披蒲编,削竹简。彼无书,且知勉。头悬梁,锥刺股。彼不教,自勤苦。如囊萤,如映雪。家虽贫,学不辍。如负薪,如挂角。身虽劳,犹苦卓。
- 一段粤语(和普通话接近度很低):3.mp3(306K) 点击下载
内容如下:
广式粤语和港式粤语作为粤语地区最有代表性的两种,到底有没有区别?那它们又是不是相通的呢?接下来,我就用它们当中比较独特的表达随机采访了几位路人,看下他们对广式粤语和港式粤语的态度是怎么样的呢?
- 李伯伯的一段评书(四川话-和普通话接近度较高):4.mp3(1.4M) 点击下载
内容有点长,后面再看看识别情况
二、Whisper环境搭建
目前Openai Whisper是人气最高的开源的语音识别项目,项目地址:【https://github.com/openai/whisper】从名字就可以看出,它是有openai开源出来的,主要利用大模型来训练。支持99 种语言,特别是对英语的支持错误率很低。Whipser 推出了 tiny、base、small、medium、large 5 个档次的模型。
模型 | 大小 | 英语 | 多语言 | 所需显存 | 相对速率 |
---|---|---|---|---|---|
tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
base | 74 M | base.en | base | ~1 GB | ~16x |
small | 244 M | small.en | small | ~2 GB | ~6x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1550 M | N/A | large | ~10 GB | 1x |
Whisper的错误率如下图所示:
下面我来看看如何安装,安装Whipser需要Python环境,所需要的环境如下:
- Python 3.9.9+
- pip 24.0+
- ffmpeg
首先检查电脑环境是否满足,如果已经满足,执行以下命令:
第一步:安装whisper
pip install -U openai-whisper
- 1
当看到有类似下面的输出表示安装成功:
Building wheels for collected packages: openai-whisper
Building wheel for openai-whisper (pyproject.toml) ... done
Created wheel for openai-whisper: filename=openai_whisper-20231117-py3-none-any.whl size=801358 sha256=9c53589d5935329764df742678ccdf63238285771a946ef7157912e71a623bb3
Stored in directory: /root/.cache/pip/wheels/0f/3e/0a/683df97c94e7b6f0818ba78f0177ebe638c30d192bdd39f399
Successfully built openai-whisper
- 1
- 2
- 3
- 4
- 5
第二步:安装ffmpeg
安装ffmpeg,这里不一样的系统安装方式也不一样,下面给出了几种系统的安装方式:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
如果你是centos,在centos7上安装ffmpeg还需要多几个步骤,具体如下:
导入Nux Dextop仓库:
sudo rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
sudo rpm -Uvh http://li.nux.ro/download/nux/dextop/el7/x86_64/nux-dextop-release-0-1.el7.nux.noarch.rpm
- 1
- 2
安装
sudo yum update -y
sudo yum install ffmpeg -y
- 1
- 2
安装成功后验证ffmpeg
ffmpeg -help
- 1
三、Whisper测试
安装成功后,我们可以直接在控制台使用:
whisper --help
- 1
如果我们要进行识别操作,具体命令如下:
whisper audio.mp3 --命令参数
- 1
常用参数说明:
–task
指定转录方式,默认使用 --task transcribe 转录模式,–task translate 则为翻译模式,目前只支持英文。
–model
指定使用模型,默认使用 --model small,Whisper 还有英文专用模型,就是在名称后加上 .en,这样速度更快。默认采用base
–language
指定转录语言,默认会截取 30 秒来判断语种,但最好指定为某种语言,比如指定中文是 --language Chinese。
–device
指定硬件加速,默认使用 auto 自动选择,–device cuda 则为显卡,cpu 就是 CPU, mps 为苹果 M1 芯片。
output_format
识别结果输出格式(txt,vtt,srt,tsv,json,all),默认为:all
output_dir
识别结果输出目录
除了在控制台直接使用外,也可以在Python中使用,Python的示例代码如下:
# coding=utf-8
import whisper
if name == ‘main’:
model = whisper.load_model(“tiny”)
audio <span class="token operator">=</span> whisper<span class="token punctuation">.</span>load_audio<span class="token punctuation">(</span><span class="token string">"1.mp3"</span><span class="token punctuation">)</span>
audio <span class="token operator">=</span> whisper<span class="token punctuation">.</span>pad_or_trim<span class="token punctuation">(</span>audio<span class="token punctuation">)</span>
mel <span class="token operator">=</span> whisper<span class="token punctuation">.</span>log_mel_spectrogram<span class="token punctuation">(</span>audio<span class="token punctuation">)</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>model<span class="token punctuation">.</span>device<span class="token punctuation">)</span>
_<span class="token punctuation">,</span> probs <span class="token operator">=</span> model<span class="token punctuation">.</span>detect_language<span class="token punctuation">(</span>mel<span class="token punctuation">)</span>
options <span class="token operator">=</span> whisper<span class="token punctuation">.</span>DecodingOptions<span class="token punctuation">(</span><span class="token punctuation">)</span>
result <span class="token operator">=</span> whisper<span class="token punctuation">.</span>decode<span class="token punctuation">(</span>model<span class="token punctuation">,</span> mel<span class="token punctuation">,</span> options<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>result<span class="token punctuation">.</span>text<span class="token punctuation">)</span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
当然也可以直接在控制台来测试,这里我整理了测试的结果如下图:
这里我是直接输出的txt格式,如果输出vtt格式,可以看到响应的时间点,类似下面:
这里我编写了一个自动化测试的shell脚本,方便大家来做相关测试:
#!/bin/sh
suffixes=(“mp3”)
models=(“tiny” “base” “small” “medium” “large”)
# models=(“tiny” “base”)
find_audio(){
suffix=
1
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
f
o
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
o
r
−
o
r
−
s
e
l
e
c
t
v
a
r
i
a
b
l
e
"
>
f
i
l
e
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
i
n
<
/
s
p
a
n
>
.
/
∗
.
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
1</span> <span class="token keyword">for</span> <span class="token for-or-select variable">file</span> <span class="token keyword">in</span> ./*.<span class="token variable">
1</span><spanclass="tokenkeyword">for</span><spanclass="tokenfor−or−selectvariable">file</span><spanclass="tokenkeyword">in</span>./∗.<spanclass="tokenvariable">suffix; do
if [ -f “
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
]
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
;
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
t
h
e
n
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
t
x
t
r
s
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span> <span class="token assign-left variable">txt_rs</span><span class="token operator">=</span><span class="token variable"><span class="token variable">
file</span>"</span><spanclass="tokenpunctuation">]</span><spanclass="tokenpunctuation">;</span><spanclass="tokenkeyword">then</span><spanclass="tokenassign−leftvariable">txtrs</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable"><spanclass="tokenvariable">(basename ”
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
d
i
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span><span class="token variable">)</span></span> <span class="token assign-left variable">dir</span><span class="token operator">=</span><span class="token variable"><span class="token variable">
file</span>"</span><spanclass="tokenvariable">)</span></span><spanclass="tokenassign−leftvariable">dir</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable"><spanclass="tokenvariable">(basename “
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
∣
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
u
n
c
t
i
o
n
"
>
c
u
t
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
a
r
a
m
e
t
e
r
v
a
r
i
a
b
l
e
"
>
−
d
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
.
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
a
r
a
m
e
t
e
r
v
a
r
i
a
b
l
e
"
>
−
f
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
n
u
m
b
e
r
"
>
1
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
f
o
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
o
r
−
o
r
−
s
e
l
e
c
t
v
a
r
i
a
b
l
e
"
>
m
o
d
e
l
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
i
n
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span> <span class="token operator">|</span> <span class="token function">cut</span> <span class="token parameter variable">-d</span> <span class="token string">"."</span> <span class="token parameter variable">-f</span> <span class="token number">1</span><span class="token variable">)</span></span> <span class="token keyword">for</span> <span class="token for-or-select variable">model</span> <span class="token keyword">in</span> <span class="token string">"<span class="token variable">
file</span>"</span><spanclass="tokenoperator">∣</span><spanclass="tokenfunction">cut</span><spanclass="tokenparametervariable">−d</span><spanclass="tokenstring">"."</span><spanclass="tokenparametervariable">−f</span><spanclass="tokennumber">1</span><spanclass="tokenvariable">)</span></span><spanclass="tokenkeyword">for</span><spanclass="tokenfor−or−selectvariable">model</span><spanclass="tokenkeyword">in</span><spanclass="tokenstring">"<spanclass="tokenvariable">{models[@]}”; do
do_whisper
f
i
l
e
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span> <span class="token variable">
file</span><spanclass="tokenvariable">dir KaTeX parse error: Expected 'EOF', got '}' at position 180: …n punctuation">}̲</span> <span c…(date +%s)
whisper $1 –language Chinese –output_dir $2_$3 –output_format txt –model=
3
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
e
n
d
t
i
m
e
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
3</span> <span class="token assign-left variable">end_time</span><span class="token operator">=</span><span class="token variable"><span class="token variable">
3</span><spanclass="tokenassign−leftvariable">endtime</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable"><spanclass="tokenvariable">(date +%s)
time_sec=
(
(
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
((</span><span class="token variable">
((</span><spanclass="tokenvariable">((end_time))-
(
(
<
/
s
p
a
n
>
s
t
a
r
t
t
i
m
e
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
)
)
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
t
x
t
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
(耗时:
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
((</span>start_time<span class="token variable">))</span></span><span class="token punctuation">))</span> <span class="token assign-left variable">txt</span><span class="token operator">=</span><span class="token string">"(耗时:"</span><span class="token variable">
((</span>starttime<spanclass="tokenvariable">))</span></span><spanclass="tokenpunctuation">))</span><spanclass="tokenassign−leftvariable">txt</span><spanclass="tokenoperator">=</span><spanclass="tokenstring">"(耗时:"</span><spanclass="tokenvariable">time_sec“秒)”
rs_file=$2_$3/
2
<
/
s
p
a
n
>
.
t
x
t
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
b
u
i
l
t
i
n
c
l
a
s
s
−
n
a
m
e
"
>
e
c
h
o
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
2</span>.txt <span class="token builtin class-name">echo</span> <span class="token string">"<span class="token variable">
2</span>.txt<spanclass="tokenbuiltinclass−name">echo</span><spanclass="tokenstring">"<spanclass="tokenvariable">txt“ >> KaTeX parse error: Expected 'EOF', got '}' at position 48: …n punctuation">}̲</span> <span c…(printf ”,%s" "
m
o
d
e
l
s
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
[
<
/
s
p
a
n
>
@
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
]
<
/
s
p
a
n
>
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
m
o
d
e
l
s
s
t
r
s
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
{models<span class="token punctuation">[</span>@<span class="token punctuation">]</span>}</span>"</span><span class="token variable">)</span></span> <span class="token assign-left variable">models_strs</span><span class="token operator">=</span><span class="token variable">
models<spanclass="tokenpunctuation">[</span>@<spanclass="tokenpunctuation">]</span></span>"</span><spanclass="tokenvariable">)</span></span><spanclass="tokenassign−leftvariable">modelsstrs</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable">{models_strs:1}
cat > report.csv << EOF
音频,$models_strs
EOF
suffix=
1
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
f
o
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
o
r
−
o
r
−
s
e
l
e
c
t
v
a
r
i
a
b
l
e
"
>
f
i
l
e
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
i
n
<
/
s
p
a
n
>
.
/
∗
.
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
1</span> <span class="token keyword">for</span> <span class="token for-or-select variable">file</span> <span class="token keyword">in</span> ./*.<span class="token variable">
1</span><spanclass="tokenkeyword">for</span><spanclass="tokenfor−or−selectvariable">file</span><spanclass="tokenkeyword">in</span>./∗.<spanclass="tokenvariable">suffix; do
if [ -f “
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
]
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
;
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
t
h
e
n
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
t
x
t
r
s
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span> <span class="token assign-left variable">txt_rs</span><span class="token operator">=</span><span class="token variable"><span class="token variable">
file</span>"</span><spanclass="tokenpunctuation">]</span><spanclass="tokenpunctuation">;</span><spanclass="tokenkeyword">then</span><spanclass="tokenassign−leftvariable">txtrs</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable"><spanclass="tokenvariable">(basename ”
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
d
i
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span><span class="token variable">)</span></span> <span class="token assign-left variable">dir</span><span class="token operator">=</span><span class="token variable"><span class="token variable">
file</span>"</span><spanclass="tokenvariable">)</span></span><spanclass="tokenassign−leftvariable">dir</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable"><spanclass="tokenvariable">(basename “
f
i
l
e
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
∣
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
u
n
c
t
i
o
n
"
>
c
u
t
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
a
r
a
m
e
t
e
r
v
a
r
i
a
b
l
e
"
>
−
d
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
.
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
a
r
a
m
e
t
e
r
v
a
r
i
a
b
l
e
"
>
−
f
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
n
u
m
b
e
r
"
>
1
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
)
<
/
s
p
a
n
>
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
f
o
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
f
o
r
−
o
r
−
s
e
l
e
c
t
v
a
r
i
a
b
l
e
"
>
m
o
d
e
l
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
i
n
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
file</span>"</span> <span class="token operator">|</span> <span class="token function">cut</span> <span class="token parameter variable">-d</span> <span class="token string">"."</span> <span class="token parameter variable">-f</span> <span class="token number">1</span><span class="token variable">)</span></span> <span class="token keyword">for</span> <span class="token for-or-select variable">model</span> <span class="token keyword">in</span> <span class="token string">"<span class="token variable">
file</span>"</span><spanclass="tokenoperator">∣</span><spanclass="tokenfunction">cut</span><spanclass="tokenparametervariable">−d</span><spanclass="tokenstring">"."</span><spanclass="tokenparametervariable">−f</span><spanclass="tokennumber">1</span><spanclass="tokenvariable">)</span></span><spanclass="tokenkeyword">for</span><spanclass="tokenfor−or−selectvariable">model</span><spanclass="tokenkeyword">in</span><spanclass="tokenstring">"<spanclass="tokenvariable">{models[@]}”; do
rs_whisper_file=
d
i
r
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
dir</span><span class="token string">"_"</span><span class="token variable">
dir</span><spanclass="tokenstring">""</span><spanclass="tokenvariable">model/$dir.txt
rs_whisper_file_txt=</span><span class="token function">cat</span> $rs_whisper_file<span class="token variable">
rs_whisper_file_txt=
(
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
b
u
i
l
t
i
n
c
l
a
s
s
−
n
a
m
e
"
>
e
c
h
o
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
(</span><span class="token builtin class-name">echo</span> <span class="token string">"<span class="token variable">
(</span><spanclass="tokenbuiltinclass−name">echo</span><spanclass="tokenstring">"<spanclass="tokenvariable">rs_whisper_file_txt“ | tr -d ‘\r’)
rs_whisper_file_txt=
(
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
b
u
i
l
t
i
n
c
l
a
s
s
−
n
a
m
e
"
>
e
c
h
o
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
(</span><span class="token builtin class-name">echo</span> <span class="token string">"<span class="token variable">
(</span><spanclass="tokenbuiltinclass−name">echo</span><spanclass="tokenstring">"<spanclass="tokenvariable">rs_whisper_file_txt” | tr -d ‘\n’)
rs_whisper_file_txt=
(
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
b
u
i
l
t
i
n
c
l
a
s
s
−
n
a
m
e
"
>
e
c
h
o
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
(</span><span class="token builtin class-name">echo</span> <span class="token string">"<span class="token variable">
(</span><spanclass="tokenbuiltinclass−name">echo</span><spanclass="tokenstring">"<spanclass="tokenvariable">rs_whisper_file_txt“ | tr -d ‘\r\n’)
rs_whisper_file_txt='”'
r
s
w
h
i
s
p
e
r
f
i
l
e
t
x
t
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
′
"
′
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
a
s
s
i
g
n
−
l
e
f
t
v
a
r
i
a
b
l
e
"
>
t
x
t
r
s
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
o
p
e
r
a
t
o
r
"
>
=
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
rs_whisper_file_txt</span><span class="token string">'"'</span> <span class="token assign-left variable">txt_rs</span><span class="token operator">=</span><span class="token variable">
rswhisperfiletxt</span><spanclass="tokenstring">′"′</span><spanclass="tokenassign−leftvariable">txtrs</span><spanclass="tokenoperator">=</span><spanclass="tokenvariable">txt_rs“,”
r
s
w
h
i
s
p
e
r
f
i
l
e
t
x
t
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
d
o
n
e
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
b
u
i
l
t
i
n
c
l
a
s
s
−
n
a
m
e
"
>
e
c
h
o
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
s
t
r
i
n
g
"
>
"
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
rs_whisper_file_txt</span> <span class="token keyword">done</span> <span class="token builtin class-name">echo</span> <span class="token string">"<span class="token variable">
rswhisperfiletxt</span><spanclass="tokenkeyword">done</span><spanclass="tokenbuiltinclass−name">echo</span><spanclass="tokenstring">"<spanclass="tokenvariable">txt_rs" >> report.csv
fi
done
}
for suffix in "
s
u
f
f
i
x
e
s
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
[
<
/
s
p
a
n
>
@
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
]
<
/
s
p
a
n
>
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
;
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
d
o
<
/
s
p
a
n
>
f
i
n
d
a
u
d
i
o
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
{suffixes<span class="token punctuation">[</span>@<span class="token punctuation">]</span>}</span>"</span><span class="token punctuation">;</span> <span class="token keyword">do</span> find_audio <span class="token variable">
suffixes<spanclass="tokenpunctuation">[</span>@<spanclass="tokenpunctuation">]</span></span>"</span><spanclass="tokenpunctuation">;</span><spanclass="tokenkeyword">do</span>findaudio<spanclass="tokenvariable">{suffix}
done
for suffix in "
s
u
f
f
i
x
e
s
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
[
<
/
s
p
a
n
>
@
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
]
<
/
s
p
a
n
>
<
/
s
p
a
n
>
"
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
p
u
n
c
t
u
a
t
i
o
n
"
>
;
<
/
s
p
a
n
>
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
k
e
y
w
o
r
d
"
>
d
o
<
/
s
p
a
n
>
d
o
r
e
p
o
r
t
<
s
p
a
n
c
l
a
s
s
=
"
t
o
k
e
n
v
a
r
i
a
b
l
e
"
>
{suffixes<span class="token punctuation">[</span>@<span class="token punctuation">]</span>}</span>"</span><span class="token punctuation">;</span> <span class="token keyword">do</span> do_report <span class="token variable">
suffixes<spanclass="tokenpunctuation">[</span>@<spanclass="tokenpunctuation">]</span></span>"</span><spanclass="tokenpunctuation">;</span><spanclass="tokenkeyword">do</span>doreport<spanclass="tokenvariable">{suffix}
done
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
大家可以修改里面的相关参数来自己做测试。
总结
从上面的测试可以看出,对标准的普通话来说,识别已经相当成功了,同时最让我惊讶的是,他对粤语的识别竟然错误率这么低,基本上是翻译了过来。四川话因为发音比较接近普通话,但是有些地方词语差异还是很大,所以识别的时候错误率还是很高的。
总的来说,作为开源产品,whisper对中文的支持已经相当好了,甚至超越了一些国内商用的产品,我将这段粤语在几个大厂的平台上去测试了一下,大部分是识别不出来的,大家可以用我的脚本测试一下更多的方言或者不同情况的下的语音。
如果有GPU设备的可以尝试一下GPU设备下效果如何?
Openai Whisper的语音更像是大力出奇迹,利用大模型训练,涵盖了大部分的语言。同时也颠覆了传统的语音识别技术。相信很快就会有更完美的模型出来。我查看了whisper的模型下载逻辑,目前好像已经有:large-v1,large-v2,large-v3了,但是由于模型较大,我没得测试环境,大家可以自行去试试。模型下载可以源码位置:python3.12/site-packages/whisper/init.py
其他相关
如果大家觉得自己搭建环境或者使用脚本太复杂,可以试试Whisper相关的图形界面,这里给大家介绍两款图形工具:
WhisperDesktop
下载地址:https://github.com/Const-me/Whisper
Buzz