OpenAI 双语文档参考 Speech to text 语音转文字 Beta

Speech to text 语音转文字Beta

Learn how to turn audio into text
了解如何将音频转换为文本

Introduction

The speech to text API provides two endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to:
语音转文本 API 提供两个端点, transcriptionstranslations ,基于我们最先进的开源 large-v2 Whisper 模型。它们可用于:

  • Transcribe audio into whatever language the audio is in.
    将音频转录成音频所使用的任何语言。
  • Translate and transcribe the audio into english.
    将音频翻译并转录成英文。

File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.
文件上传目前限制为 25 MB,支持以下输入文件类型: mp3mp4mpegmpgam4awavwebm

Quickstart

Transcriptions

The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. We currently support multiple input and output file formats.
转录 API 将您要转录的音频文件和音频转录所需的输出文件格式作为输入。我们目前支持多种输入和输出文件格式。

Transcribe audio
python
# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai
audio_file= open("/path/to/file/audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)

By default, the response type will be json with the raw text included.
默认情况下,响应类型将是包含原始文本的 json。

{ “text”: "Imagine the wildest idea that you’ve ever had, and you’re curious about how it might scale to something that’s a 100, a 1,000 times bigger. … }
{ “text”: "想象一下你曾经有过的最疯狂的想法,你很好奇它如何扩展到 100 倍、1000 倍大的东西。…}

To set additional parameters in a request, you can add more --form lines with the relevant options. For example, if you want to set the output format as text, you would add the following line:
要在请求中设置其他参数,您可以添加更多带有相关选项的 --form 行。例如,如果要将输出格式设置为文本,则可以添加以下行:

...
--form file=@openai.mp3 \
--form model=whisper-1 \
--form response_format=text

Translations

The translations API takes as input the audio file in any of the supported languages and transcribes, if necessary, the audio into english. This differs from our /Transcriptions endpoint since the output is not in the original input language and is instead translated to english text.
翻译 API 将任何受支持语言的音频文件作为输入,并在必要时将音频转录为英语。这与我们的 /Transcriptions 端点不同,因为输出不是原始输入语言,而是翻译成英文文本。

Translate audio
python
# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai
audio_file= open("/path/to/file/german.mp3", "rb")
transcript = openai.Audio.translate("whisper-1", audio_file)

In this case, the inputted audio was german and the outputted text looks like:
在这种情况下,输入的音频是德语,输出的文本如下所示:

Hello, my name is Wolfgang and I come from Germany. Where are you heading today?
大家好,我叫沃尔夫冈,来自德国。你今天要去哪里?

We only support translation into english at this time.
我们目前只支持翻译成英文。

Supported languages 支持的语言

We currently support the following languages through both the transcriptions and translations endpoint:
我们目前通过 transcriptionstranslations 端点支持以下语言:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
南非荷兰语、阿拉伯语、亚美尼亚语、阿塞拜疆语、白俄罗斯语、波斯尼亚语、保加利亚语、加泰罗尼亚语、中文、克罗地亚语、捷克语、丹麦语、荷兰语、英语、爱沙尼亚语、芬兰语、法语、加利西亚语、德语、希腊语、希伯来语、印地语、匈牙利语、冰岛语、印度尼西亚语、意大利语、日语、卡纳达语、哈萨克语、韩语、拉脱维亚语、立陶宛语、马其顿语、马来语、马拉地语、毛利语、尼泊尔语、挪威语、波斯语、波兰语、葡萄牙语、罗马尼亚语、俄语、塞尔维亚语、斯洛伐克语、斯洛文尼亚语、西班牙语、斯瓦希里语、瑞典语、他加禄语、泰米尔语、泰语、土耳其语、乌克兰语、乌尔都语、越南语和威尔士语。

While the underlying model was trained on 98 languages, we only list the languages that exceeded <50% word error rate (WER) which is an industry standard benchmark for speech to text model accuracy. The model will return results for languages not listed above but the quality will be low.
虽然基础模型是针对 98 种语言进行训练的,但我们只列出了超过 <50% 单词错误率 (WER) 的语言,这是语音到文本模型准确性的行业标准基准。该模型将返回上面未列出的语言的结果,但质量会很低。

Longer inputs

By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB’s or less or used a compressed audio format. To get the best performance, we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.
默认情况下,Whisper API 仅支持小于 25 MB 的文件。如果您有比这更长的音频文件,则需要将其分成 25 MB 或更小的块或使用压缩音频格式。为了获得最佳性能,我们建议您避免在句子中间打断音频,因为这可能会导致某些上下文丢失。

One way to handle this is to use the PyDub open source Python package to split the audio:
一种处理方法是使用 PyDub 开源 Python 包来分割音频:

from pydub import AudioSegment

song = AudioSegment.from_mp3("good_morning.mp3")

# PyDub handles time in milliseconds
ten_minutes = 10 * 60 * 1000

first_10_minutes = song[:ten_minutes]

first_10_minutes.export("good_morning_10.mp3", format="mp3")

OpenAI makes no guarantees about the usability or security of 3rd party software like PyDub.
OpenAI 不保证 PyDub 等第三方软件的可用性或安全性。

Prompting

You can use a prompt to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt, so it will be more likely to use capitalization and punctuation if the prompt does too. However, the current prompting system is much more limited than our other language models and only provides limited control over the generated audio. Here are some examples of how prompting can help in different scenarios:
您可以使用提示来提高 Whisper API 生成的转录本的质量。该模型将尝试匹配提示的样式,因此如果提示也是如此,它更有可能使用大写和标点符号。然而,当前的提示系统比我们的其他语言模型要受限得多,并且只能对生成的音频提供有限的控制。以下是提示如何在不同情况下提供帮助的一些示例:

  1. Prompts can be very helpful for correcting specific words or acronyms that the model often misrecognizes in the audio. For example, the following prompt improves the transcription of the words DALL·E and GPT-3, which were previously written as “GDP 3” and “DALI”.
    提示对于纠正模型经常在音频中错误识别的特定单词或首字母缩略词非常有帮助。比如下面的提示改进了DALL·E和GPT-3这两个词的转写,之前写成“GDP 3”和“DALI”。

    The transcript is about OpenAI which makes technology like DALL·E, GPT-3, and ChatGPT with the hope of one day building an AGI system that benefits all of humanity
    成绩单是关于 OpenAI 的,它制造了 DALL·E、GPT-3 和 ChatGPT 等技术,希望有一天能建立一个造福全人类的 AGI 系统

  2. To preserve the context of a file that was split into segments, you can prompt the model with the transcript of the preceding segment. This will make the transcript more accurate, as the model will use the relevant information from the previous audio. The model will only consider the final 224 tokens of the prompt and ignore anything earlier.
    要保留被拆分成多个片段的文件的上下文,您可以使用前一个片段的转录本提示模型。这将使转录更加准确,因为模型将使用先前音频中的相关信息。该模型将只考虑提示的最后 224 个标记,并忽略之前的任何内容。

  3. Sometimes the model might skip punctuation in the transcript. You can avoid this by using a simple prompt that includes punctuation:
    有时,模型可能会跳过文字记录中的标点符号。您可以使用包含标点符号的简单提示来避免这种情况:

    Hello, welcome to my lecture. 大家好,欢迎收听我的讲座。

  4. The model may also leave out common filler words in the audio. If you want to keep the filler words in your transcript, you can use a prompt that contains them:
    该模型还可能会遗漏音频中的常见填充词。如果您想在成绩单中保留填充词,您可以使用包含它们的提示:

    Umm, let me think like, hmm… Okay, here’s what I’m, like, thinking."
    嗯,让我想想,嗯……好吧,这就是我的想法。”

  5. Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.
    有些语言可以用不同的方式书写,例如简体中文或繁体中文。默认情况下,模型可能不会始终使用您想要的成绩单写作风格。您可以通过使用您喜欢的写作风格的提示来改进这一点。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Spring框架文档-核心技术部分是Spring官方提供的文档,用于指导开发人员学习和使用Spring框架的核心技术。该文档以教程的形式呈现,详细介绍了Spring框架的各个核心模块和相关技术。 该文档主要包括以下内容: 1. IoC容器:介绍了Spring的IoC(Inversion of Control)容器,解释了IoC的概念和作用。同时,通过示例代码详细说明了如何配置和使用IoC容器,以及如何通过IoC容器实现应用程序组件之间的解耦。 2. Bean:介绍了Spring框架中的Bean概念和相关技术。文档解释了如何通过配置文件或注解的方式定义Bean,并说明了如何在应用程序中使用和管理Bean。 3. AOP:介绍了Spring框架中的AOP(Aspect Oriented Programming)技术,解释了AOP的概念和作用。文档详细说明了如何通过配置文件或注解的方式定义切面和通知,并演示了如何将切面应用到应用程序中的特定方法或类上。 4. JDBC:介绍了Spring框架对JDBC(Java Database Connectivity)的集成支持。文档详细说明了如何使用Spring的JdbcTemplate和NamedParameterJdbcTemplate等API简化数据库访问操作,并提供了示例代码说明。 5. 事务管理:介绍了Spring框架对事务管理的支持。文档解释了如何配置和使用Spring的声明式事务,以及如何控制事务的传播行为和隔离级别。 6. Web技术集成:介绍了Spring框架在Web开发中的支持。文档说明了如何使用Spring MVC框架构建Web应用程序,以及如何通过Spring的Web模块集成其他Web技术,如Servlet、JSP和WebSocket等。 通过阅读和理解这些文档,开发人员可以深入了解Spring框架的核心技术,并准确地应用到实际项目开发中。-

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值