1,windows下有软件,百度一搜索一堆,但效果一般,推荐“录音啦”
2,开源软件autosub,archtime(视频添加字幕),其中autosub以来google的语音识别接口,国内翻“强"后稳定性不佳可能导致翻译失败(400M的视频,从来没成功过)
3,ibm工具SpeechRecognition
ffmpeg视频提取音频为wav
ffmpeg -i 01-20170326.mp4 -f wav -ar 16000 01.wav
import speech_recognition as sr
path='/media/john/本地磁盘/TDDOWNLOAD/vnpy_video/01.wav'
r = sr.Recognizer()
with sr.WavFile(path) as source:
audio = r.record(source)
IBM_USERNAME='xxxxx'
IBM_PASSWORD='xxxx'
text = r.recognize_ibm(audio, username = IBM_USERNAME, password = IBM_PASSWORD, language = 'zh-CN')
(如果太长的化可能会报错)
音频的倍率加速:ffmpeg -i 01.wav -filter:a "atempo=2.0" -vn 01_s.wav
音频的清除静音:ffmpeg -i 01_s.wav -af silenceremove=1:0:-50dB:-1:0:-50dB 01_sb.wav
多段视频合并
ffmpeg -i 03_sb.wav -i 04_sb.wav -filter_complex '[0:0] [1:0] concat=n=2:v=0:a=1 [a]' -map [a] cat_03_04.wav
ffmpeg -i 03_sb.wav -i 04_sb.wav -i 06_sb.wav -filter_complex '[0:0] [1:0] [2:0] concat=n=3:v=0:a=1 [a]' -map [a] cat_03_04_06.wav
ffmpeg -i 01_sb.wav -i 02_sb.wav -i 05_sb.wav -i 06_sb.wav -filter_complex '[0:0] [1:0] [2:0] [3:0] concat=n=4:v=0:a=1 [a]' -map [a] cat_01_02_05_06.wav
视频精确切分:ffmpeg -i cat_03_04_06.wav -ss 0 -t 7200 -codec copy cat_03_04_06_cat.wav
参考:https://blog.csdn.net/huplion/article/details/80839944
https://blog.csdn.net/qq_42156420/article/details/81122018
https://github.com/watson-developer-cloud/python-sdk
https://pypi.org/project/SpeechRecognition/
https://github.com/watson-developer-cloud/python-sdk
https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/speech_to_text_v1.py
https://blog.csdn.net/tang20120235/article/details/49762421#
4,讯飞,百度等国内大公司的语音识别接口
开发参考:https://blog.csdn.net/yuanlulu/article/details/81947880
讯飞听见:https://www.iflyrec.com/html/addMachineOrder.html(长语音效果最佳,但是要收费)