今天接到一个用python实现音频转文本字幕的单子,首先想到用IBM的SpeechRecognition服务,经过去IBM CLOUD申请服务,再通过python掉包speech_recognition的一顿操作后发现,新版本的IBM cloud服务竟然将原凭证的username+pwd的验证形式改成了APIKEY+URL的形式
然后官方给出了用curl指令调用API接口实现本地音频转文字的方法:
curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary @{path_to_file}audio-file.flac "{url}/v1/recognize"
测试了下也能用,最可怕的问题来了,我是要把curl指令转成python代码实现功能的呀,好了,转换后的python代码如下:
import requests
headers = {
'Content-Type': 'audio/flac',
}
data = open('audio-file.flac', 'rb').read()
r = requests.post('https://gateway-wdc.watsonplatform.net/speech-to-text/api/v1/recognize', headers=headers, data=data, auth=('apikey', '***************************'))
print(r.text)
测试效果:
这根本没用到包好嘛.....竟然直接请求接口就可以了。最后附上原版本调用接口实现方法的代码:
import speech_recognition as sr
import requests
harvard = sr.AudioFile('23.wav')
r = sr.Recognizer()
with harvard as source:
audio = r.record(source)
print(type(audio))
IBM_USERNAME = '************************'
IBM_PASSWORD = '************************'
text = r.recognize_google(audio, username= IBM_USERNAME, password = IBM_PASSWORD, language = 'zh-CN')
print(text)