基于Whisper语音识别的实时视频字幕生成 (二): 在线实时字幕

Ephemeroptera

已于 2024-04-14 19:13:28 修改

阅读量241

点赞数 2

分类专栏： Whistream 算法设计工具开发文章标签： whisper opencv 语音识别人工智能

于 2024-04-09 01:45:54 首次发布

本文链接：https://blog.csdn.net/Ephemeroptera/article/details/137530332

版权

算法设计同时被 3 个专栏收录

6 篇文章 0 订阅

订阅专栏

工具开发

4 篇文章 0 订阅

订阅专栏

Whistream

2 篇文章 0 订阅

订阅专栏

Whistream

基于whisper的在线字幕生成

1. 安装

pip install whishow
pip install faster-whisper

2. 用法

基于whishow流处理和faster-whisper语音识别的多线程交互进行在线字幕生成

import threading
from whisper_api import WhisperModel
from whishow import STREAM
from whishow import PLAY
import time 

url = "rtmp://mobliestream.c3tv.com:554/live/goodtv.sdp"
url = "test.mp4"
language = "zh"

# init the stream reader, named stm.
stm = STREAM()
stm.init_state(url=url,
               cache_size=10*60,
               video_frame_quality=50)

# init the whisper model, and connect the audio stream of stm
asr = WhisperModel(model_size_or_path=r"tiny", 
                    device='cpu', 
                    compute_type="int8",
                    download_root=r"./models",
                    local_files_only=False)
asr.init_state(Q_audio_asr=stm.Q_audio_asr,
               read_size=16000)

# init the whishow player, and connect the audio/video stream of stm and the asr result
ply = PLAY()
ply.init_state(chunk_size=1,
                video_frame_shift=10,
                audio_fps=stm.AUDIO_FPS,
                video_fps=stm.VIDEO_FPS,
                Q_audio_play=stm.Q_audio_play,
                Q_video_play=stm.Q_video_play,
                asr_results=asr.asr_results)
                

# launch the stm
def process1():
    global stm
    stm.read(is_play=True,
             is_asr=True)

# launch the asr, modify your seeting
def recognition():
    global asr
    asr.transcribe(language=language,
                   task="transcribe",
                   condition_on_previous_text=True)
def process2():
    global asr
    p1 = threading.Thread(target=asr.read_data,args=())
    p2 = threading.Thread(target=recognition,args=())
    p1.start()
    p2.start()
    p1.join()
    p2.join()


# lanuch the player
def process3():
    global ply,stm
    delay = 60
    while stm.at < delay:
         print("wait for asr preprocess ..")
         time.sleep(1)
    ply.run()

# esc for exit
def engine():
        global asr,stm
        import keyboard
        while 1:
            if keyboard.is_pressed('esc'):
                print("exit ..")
                break
            time.sleep(0.1)
        stm.running = False
        asr.running = False
        ply.running = False

if __name__ == "__main__":

    p0 = threading.Thread(target=engine,args=())
    p1 = threading.Thread(target=process1,args=())
    p2 = threading.Thread(target=process2,args=())
    p3 = threading.Thread(target=process3,args=())

    p0.start()
    p1.start()
    p2.start()
    p3.start()