Whisper-AT 项目使用教程

最新推荐文章于 2024-09-12 00:53:58 发布

郭蔷意Ward

最新推荐文章于 2024-09-12 00:53:58 发布

阅读量362

点赞数 5

本文链接：https://blog.csdn.net/gitblog_00564/article/details/141345683

版权

Whisper-AT 项目使用教程

whisper-atCode and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"项目地址:https://gitcode.com/gh_mirrors/wh/whisper-at

1. 项目的目录结构及介绍

Whisper-AT 项目的目录结构如下：

whisper-at/
├── README.md
├── requirements.txt
├── setup.py
├── whisper_at/
│   ├── __init__.py
│   ├── model.py
│   ├── utils.py
│   └── ...
├── examples/
│   ├── example1.py
│   ├── example2.py
│   └── ...
├── tests/
│   ├── test_model.py
│   ├── test_utils.py
│   └── ...
└── ...

目录结构介绍

README.md: 项目说明文档。
requirements.txt: 项目依赖文件。
setup.py: 项目安装脚本。
whisper_at/: 核心代码目录，包含模型、工具等模块。
- __init__.py: 模块初始化文件。
- model.py: 模型定义文件。
- utils.py: 工具函数文件。
examples/: 示例代码目录，包含多个使用示例。
tests/: 测试代码目录，包含多个测试脚本。

2. 项目的启动文件介绍

项目的启动文件主要是 examples/ 目录下的示例脚本。以下是一个典型的启动文件示例：

# examples/example1.py

import whisper_at as whisper

# 设置音频标签的时间分辨率
audio_tagging_time_resolution = 10

# 加载模型
model = whisper.load_model("large-v1")

# 识别音频文件
result = model.transcribe("audio.mp3", at_time_res=audio_tagging_time_resolution)

# 输出ASR结果
print(result["text"])

# 输出音频标签结果
audio_tag_result = whisper.parse_at_label(result, language='follow_asr', top_k=5, p_threshold=-1, include_class_list=list(range(527)))
print(audio_tag_result)

启动文件介绍

import whisper_at as whisper: 导入 Whisper-AT 模块。
audio_tagging_time_resolution = 10: 设置音频标签的时间分辨率。
model = whisper.load_model("large-v1"): 加载预训练模型。
result = model.transcribe("audio.mp3", at_time_res=audio_tagging_time_resolution): 识别音频文件并输出结果。
print(result["text"]): 输出ASR结果。
audio_tag_result = whisper.parse_at_label(...): 解析音频标签结果并输出。

3. 项目的配置文件介绍

项目的配置文件主要是 requirements.txt 和 setup.py。

requirements.txt

requirements.txt 文件列出了项目运行所需的所有依赖包：

numba
numpy
torch
tqdm
more-itertools
tiktoken==0.3.3

setup.py

setup.py 文件用于项目的安装和打包：

from setuptools import setup, find_packages

setup(
    name='whisper-at',
    version='0.1.0',
    packages=find_packages(),
    install_requires=[
        'numba',
        'numpy',
        'torch',
        'tqdm',
        'more-itertools',
        'tiktoken==0.3.3'
    ],
    entry_points={
        'console_scripts': [
            'whisper-at=whisper_at.cli:main',
        ],
    },
)