系统环境:win 10
编译环境:Thonny 3.3.13
编程语言:python 3.7.9
导师希望我帮忙编写一个能够实现离线语音识别并记录的程序,在查阅CSDN资料后,本只会python的弱鸡感觉貌似只有PocketSphinx比较靠谱
本文基于pyhton基于PocketSphinx实现简单语音识别_疯人忠的博客-CSDN博客_pocketsphinx基础上写成,在按照博客教学步骤进行语音识别后,发现正确率极低
广泛查询之后,发现可能是其对.lm转lm.bin直接重命名的方法有问题
在使用https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst
中提到的SphinxBase进行转换之后,识别率达到了95% (针对14个词语的小词库
sphinx_lm_convert -i mylm.lm -o mylm.lm.bin
使用pyAudio获取语音
import pyaudio
import wave
import speech_recognition as speech
def record_and_recog(wave_out_path):
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000 #采样率一定要16000!
TIME = 3 #以秒为单位
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
wf = wave.open(wave_out_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
for _ in range(0, int(RATE / CHUNK * TIME)):
data = stream.read(CHUNK)
wf.writeframes(data)
stream.stop_stream(); stream.close(); p.terminate(); wf.close()
额外使用了
with speech.Microphone() as source:
recognizer.adjust_for_ambient_noise(source)
进行简单的降噪处理
之后对录音进行识别
audio = speech.AudioFile(wave_out_path)
with audio as source:
audio = recognizer.record(source)
# 使用Sphinx识别语音
try:
resultstr = recognizer.recognize_sphinx(audio, language='zh-CN')
if " " not in resultstr and resultstr != "":
resultstr = "Sphinx识别结果为:" + resultstr
elif resultstr = "":
resultstr = "错误识别?空结果"
else:
resultstr = "错误识别?多字:" + resultstr
except speech.RequestError as e:
resultstr = "警告!{0}".format(e)
成功!