Jetson AGX Orin平台搭建whisper语音转写实时录音

itom1900

已于 2024-05-07 16:35:47 修改

阅读量355

点赞数 3

文章标签： whisper python 人工智能

于 2024-05-07 16:33:32 首次发布

本文链接：https://blog.csdn.net/itom1900/article/details/138536004

版权

1：下载whisper C++版本

whisper.cpp

编译WHISPER_CUDA=1 make -j

错误

A: 平台不支持，修改Makefile，查看支持的计算ARCH_FLAG

nvcc fatal   : Value 'all' is not defined for option 'gpu-architecture'
make: *** [Makefile:290: ggml-cuda.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:287: ggml-cuda/getrows.o] Error 1
nvcc fatal   : Value 'all' is not defined for option 'gpu-architecture'
nvcc fatal   : Value 'all' is not defined for option 'gpu-architecture'
make: *** [Makefile:287: ggml-cuda/diagmask.o] Error 1
make: *** [Makefile:287: ggml-cuda/mmvq.o] Error 1
nvcc fatal   : Value 'all' is not defined for option 'gpu-architecture'
make: *** [Makefile:287: ggml-cuda/quantize.o] Error 1

nvidia@ubuntu:~/TTS/whisper.cpp$ nvcc --list-gpu-arch
compute_35
compute_37
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87

ifdef WHISPER_CUDA
	ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
		CUDA_ARCH_FLAG ?= native
	else
		CUDA_ARCH_FLAG ?= compute_87
	endif

B：错误修改Makefile 339行开始注释掉

CFLAGS   += -mcpu=native
make: CFLAGS: Command not found

# ifneq ($(filter aarch64%,$(UNAME_M)),)
# 	CFLAGS   += -mcpu=native
# 	CXXFLAGS += -mcpu=native
# endif

这样编译可通过了, Steam是实时转写的，用大模型效果会好点

模型下载用github的脚本下载会报错，可以在以下链接下载

https://huggingface.co/ggerganov/whisper.cpp
https://ggml.ggerganov.com

WHISPER_CUDA=1 make -j

WHISPER_CUDA=1 make stream -j

./main -m models/ggml-model-whisper-base.bin -f ../news.wav -l Chinese

./stream -m ./models/ggml-model-whisper-base.bin -t 6 --step 0 --length 15000 -vth 1 -l chinese -c 1

很多应用APP都在example目录

1：语音转写需要转成16bit的

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

2：转写效果

nvidia@ubuntu:~/TTS/whisper.cpp$ time ./main -m models/ggml-model-whisper-base.bin -f ../news.wav -l Chinese -pp -ps
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-model-whisper-base.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Orin, compute capability 8.7, VMM: yes
whisper_model_load:    CUDA0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  132.07 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB

system_info: n_threads = 4 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0

main: processing '../news.wav' (3134182 samples, 195.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = chinese, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:04.720]  [_BEG_]早啊 新聞來了[_TT_236]

[00:02:51.080 --> 00:02:54.200]  [_BEG_]吉林東部遼寧東北部新疆伊莉河谷[_TT_156]
[00:02:54.200 --> 00:02:58.840]  貴州南部雲南東部廣西西部等地部份地區有中道大雨[_TT_388]
[00:02:58.840 --> 00:03:04.120]  其中貴州西南部雲南東北部廣西西北部等地局部地區有暴雨[_TT_652]
[00:03:04.120 --> 00:03:08.320]  感謝關注央視新聞[_TT_862]
[00:03:08.320 --> 00:03:11.160]  更多資訊可以下載央視新聞客戶專[_TT_1004]
[00:03:11.160 --> 00:03:12.520]  我們明天早上見[_TT_1072]
whisper_print_progress_callback: progress =  98%
[00:03:12.520 --> 00:03:15.520]  [_BEG_]祝祝祝祝祝祝祝祝祝祝祝祝祝祝[_TT_150]
whisper_print_progress_callback: progress =  99%


whisper_print_timings:     load time =   221.28 ms
whisper_print_timings:     fallbacks =   0 p /   1 h
whisper_print_timings:      mel time =   167.07 ms
whisper_print_timings:   sample time =  3298.24 ms /  5411 runs (    0.61 ms per run)
whisper_print_timings:   encode time =  1261.24 ms /     8 runs (  157.66 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =  5198.60 ms /  5377 runs (    0.97 ms per run)
whisper_print_timings:   prompt time =   111.11 ms /  1260 runs (    0.09 ms per run)
whisper_print_timings:    total time = 10290.02 ms

real    0m10.412s
user    0m6.978s
sys     0m0.526s