项目说明:
在windows平台下,使用speech_recognition记录音频,并转换为16k的wav, 之后利用ffmpeg将wav转化为pcm文件,上传到百度语音端,返回语音信息,并利用pyttsx3添加了简单的交互功能。
需求模块:
speech_recognition, pyttsx3, pyaudio, wave, aip, ffmpeg
模块安装:
- speech_recognition: https://pypi.org/project/SpeechRecognition/
- pyttsx3: https://blog.csdn.net/dss_dssssd/article/details/82693742
- pyaudio: https://pypi.org/project/PyAudio/
- aip:https://ai.baidu.com/docs#/ASR-Online-Python-SDK/top
- ffmpeg (Windows下) 注意是系统的环境变量,不是个人的path
https://blog.csdn.net/zhuiqiuk/article/details/72834385
代码如下
import speech_recognition as sr
import pyttsx3
import pyaudio
import wave
from aip import AipSpeech
import os
# 读取wav文件并播放
def read_wav():
CHUNK = 1024
# 测试语音
wf = wave.open('./2.wav', 'rb')
# read data
data = wf.readframes(CHUNK)
p = pyaudio.PyAudio()
FORMAT = p.get_format_from_width(wf.getsampwidth())
CHANNELS = wf.getnchannels()
RATE = wf.getframerate