python语音识别_Python语音识别终极指南

python语音识别

Have you ever wondered how to add speech recognition to your Python project? If so, then keep reading! It’s easier than you might think.

您是否曾经想过如何在Python项目中添加语音识别? 如果是这样,请继续阅读! 它比您想象的要容易。

Far from a being a fad, the overwhelming success of speech-enabled products like Amazon Alexa has proven that some degree of speech support will be an essential aspect of household tech for the foreseeable future. If you think about it, the reasons why are pretty obvious. Incorporating speech recognition into your Python application offers a level of interactivity and accessibility that few technologies can match.

诸如Amazon Alexa之类的语音支持产品取得了巨大的成功,这远非时尚,但事实证明,在可预见的将来,一定程度的语音支持将成为家用技术的重要方面。 如果您考虑一下,其原因很明显。 将语音识别功能集成到您的Python应用程序中,可以提供很少有技术可以匹敌的交互性和可访问性。

The accessibility improvements alone are worth considering. Speech recognition allows the elderly and the physically and visually impaired to interact with state-of-the-art products and services quickly and naturally—no GUI needed!

仅可访问性改进值得考虑。 语音识别功能使老年人以及肢体和视力障碍人士可以快速自然地与最新的产品和服务进行交互-无需GUI!

Best of all, including speech recognition in a Python project is really simple. In this guide, you’ll find out how. You’ll learn:

最重要的是,在Python项目中包括语音识别非常简单。 在本指南中,您将了解操作方法。 您将学到:

  • How speech recognition works,
  • What packages are available on PyPI; and
  • How to install and use the SpeechRecognition package—a full-featured and easy-to-use Python speech recognition library.
  • 语音识别的工作原理
  • PyPI提供哪些软件包; 和
  • 如何安装和使用SpeechRecognition包-一个功能齐全且易于使用的Python语音识别库。

In the end, you’ll apply what you’ve learned to a simple “Guess the Word” game and see how it all comes together.

最后,您将把学到的知识应用到一个简单的“猜单词”游戏中,并查看它们如何结合在一起。

Free Bonus: Click here to download a Python speech recognition sample project with full source code that you can use as a basis for your own speech recognition apps.

免费红利: 单击此处下载具有完整源代码的Python语音识别示例项目 ,您可以将其用作自己的语音识别应用程序的基础。

语音识别如何工作–概述 (How Speech Recognition Works – An Overview)

Before we get to the nitty-gritty of doing speech recognition in Python, let’s take a moment to talk about how speech recognition works. A full discussion would fill a book, so I won’t bore you with all of the technical details here. In fact, this section is not pre-requisite to the rest of the tutorial. If you’d like to get straight to the point, then feel free to skip ahead.

在开始使用Python进行语音识别的本质之前,让我们先花点时间来讨论语音识别的工作原理。 进行全面的讨论将使您充满精力,因此在这里我不会为您带来所有技术细节。 实际上,本节不是本教程其余部分的前提条件。 如果您想直截了当,请随时跳过。

Speech recognition has its roots in research done at Bell Labs in the early 1950s. Early systems were limited to a single speaker and had limited vocabularies of about a dozen words. Modern speech recognition systems have come a long way since their ancient counterparts. They can recognize speech from multiple speakers and have enormous vocabularies in numerous languages.

语音识别源于1950年代初期在贝尔实验室进行的研究。 早期的系统仅限于一个说话者,并且词汇量大约只有十几个单词。 自从古代语音识别系统以来,现代语音识别系统已经走了很长一段路。 他们可以识别来自多个说话者的语音,并具有多种语言的大量词汇。

The first component of speech recognition is, of course, speech. Speech must be converted from physical sound to an electrical signal with a microphone, and then to digital data with an analog-to-digital converter. Once digitized, several models can be used to transcribe the audio to text.

语音识别的第一部分当然是语音。 必须使用麦克风将语音从物理声音转换为电信号,然后通过模数转换器转换为数字数据。 数字化后,可以使用多种模型将音频转录为文本。

Most modern speech recognition systems rely on what is known as a Hidden Markov Model (HMM). This approach works on the assumption that a speech signal, when viewed on a short enough timescale (say, ten milliseconds), can be reasonably approximated as a stationary process—that is, a process in which statistical properties do not change over time.

大多数现代语音识别系统都依赖于所谓的隐马尔可夫模型 (HMM)。 这种方法的假设是,在足够短的时间尺度(例如10毫秒)上查看语音信号时,可以合理地将其近似为固定过程,即统计属性不会随时间变化的过程。

In a typical HMM, the speech signal is divided into 10-millisecond fragments. The power spectrum of each fragment, which is essentially a plot of the signal’s power as a function of frequency, is mapped to a vector of real numbers known as cepstral coefficients. The dimension of this vector is usually small—sometimes as low as 10, although more accurate systems may have dimension 32 or more. The final output of the HMM is a sequence of these vectors.

在典型的HMM中,语音信号被分为10毫秒的片段。 每个片段的功率谱(实质上是信号功率随频率变化的曲线图)被映射到称为倒频谱系数的实数向量。 此向量的维数通常很小,有时低至10,尽管更精确的系统可能具有32或更大的维数。 HMM的最终输出是这些向量的序列。

To decode the speech into text, groups of vectors are matched to one or more phonemes—a fundamental unit of speech. This calculation requires training, since the sound of a phoneme varies from speaker to speaker, and even varies from one utterance to another by the same speaker. A special algorithm is then applied to determine the most likely word (or words) that produce the given sequence of phonemes.

为了将语音解码为文本,矢量组必须与一个或多个音素(一种基本的语音单位)相匹配。 该计算需要训练,因为一个音素的声音因扬声器的不同而不同,甚至同一扬声器的一个发音到另一个也不同。 然后应用特殊算法来确定产生给定音素序列的最可能单词(或多个单词)。

One can imagine that this whole process may be computationally expensive. In many modern speech recognition systems, neural networks are used to simplify the speech signal using techniques for feature transformation and dimensionality reduction before HMM recognition. Voice activity detectors (VADs) are also used to reduce an audio signal to only the portions that are likely to contain speech. This prevents the recognizer from wasting time analyzing unnecessary parts of the signal.

可以想象整个过程可能在计算上是昂贵的。 在许多现代语音识别系统中,神经网络用于在HMM识别之前使用特征转换和降维技术简化语音信号。 语音活动检测器(VAD)也用于将音频信号减少到仅可能包含语音的部分。 这样可以防止识别器浪费时间分析信号的不必要部分。

Fortunately, as a Python programmer, you don’t have to worry about any of this. A number of speech recognition services are available for use online through an API, and many of these services offer Python SDKs.

幸运的是,作为Python程序员,您不必为此担心。 通过API可以在线使用多种语音识别服务,其中许多服务都提供Python SDK。

选择一个Python语音识别包 (Picking a Python Speech Recognition Package)

A handful of packages for speech recognition exist on PyPI. A few of them include:

PyPI上存在一些用于语音识别的软件包。 其中一些包括:

Some of these packages—such as wit and apiai—offer built-in features, like natural language processing for identifying a speaker’s intent, which go beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.

其中一些软件包(例如wit和apiai)提供了内置功能,例如用于识别说话者意图的自然语言处理,这超出了基本的语音识别能力。 其他如google-cloud-speech则仅专注于语音到文本的转换。

There is one package that stands out in terms of ease-of-use: SpeechRecognition.

就易用性而言,有一种软件包非常出色:SpeechRecognition。

Recognizing speech requires audio input, and SpeechRecognition makes retrieving this input really easy. Instead of having to build scripts for accessing microphones and processing audio files from scratch, SpeechRecognition will have you up and running in just a few minutes.

识别语音需要音频输入,而SpeechRecognition使得检索此输入非常容易。 无需构建用于访问麦克风和从头开始处理音频文件的脚本,SpeechRecognition只需几分钟即可启动并运行。

The SpeechRecognition library acts as a wrapper for several popular speech APIs and is thus extremely flexible. One of these—the Google Web Speech API—supports a default API key that is hard-coded into the SpeechRecognition library. That means you can get off your feet without having to sign up for a service.

SpeechRecognition库充当几种流行语音API的包装,因此非常灵活。 其中之一(Google Web Speech API)支持默认API密钥,该密钥硬编码到SpeechRecognition库中。 这意味着您无需注册服务就可以站起来。

The flexibility and ease-of-use of the SpeechRecognition package make it an excellent choice for any Python project. However, support for every feature of each API it wraps is not guaranteed. You will need to spend some time researching the available options to find out if SpeechRecognition will work in your particular case.

SpeechRecognition软件包的灵活性和易用性使其成为任何Python项目的绝佳选择。 但是,不能保证支持它包装的每个API的每个功能。 您将需要花费一些时间来研究可用的选项,以找出SpeechRecognition在您的特定情况下是否可以工作。

So, now that you’re convinced you should try out SpeechRecognition, the next step is getting it installed in your environment.

因此,既然您已经确信您应该尝试SpeechRecognition,那么下一步就是将其安装在您的环境中。

安装语音识别 (Installing SpeechRecognition)

SpeechRecognition is compatible with Python 2.6, 2.7 and 3.3+, but requires some additional installation steps for Python 2. For this tutorial, I’ll assume you are using Python 3.3+.

SpeechRecognition与Python 2.6、2.7和3.3+兼容,但是需要一些额外的Python 2安装步骤 。 对于本教程,我假设您使用的是Python 3.3+。

You can install SpeechRecognition from a terminal with pip:

您可以使用pip在终端上安装SpeechRecognition:

 $ pip install SpeechRecognition
$ pip install SpeechRecognition

Once installed, you should verify the installation by opening an interpreter session and typing:

安装后,您应该通过打开解释器会话并键入以下命令来验证安装:

Note: The version number you get might vary. Version 3.8.1 was the latest at the time of writing.

注意:您获得的版本号可能会有所不同。 在撰写本文时,版本3.8.1是最新的。

Go ahead and keep this session open. You’ll start to work with it in just a bit.

继续进行此会议。 您将很快开始使用它。

SpeechRecognition will work out of the box if all you need to do is work with existing audio files. Specific use cases, however, require a few dependencies. Notably, the PyAudio package is needed for capturing microphone input.

如果您需要做的就是使用现有的音频文件,那么语音识别即开即用。 但是,特定的用例需要一些依赖性。 值得注意的是,需要PyAudio软件包来捕获麦克风输入。

You’ll see which dependencies you need as you read further. For now, let’s dive in and explore the basics of the package.

在进一步阅读时,您将看到需要哪些依赖项。 现在,让我们深入探讨该软件包的基础知识。

Recognizer (The Recognizer Class)

All of the magic in SpeechRecognition happens with the Recognizer class.

SpeechRecognition中的所有魔力都发生在Recognizer类中。

The primary purpose of a Recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source.

当然, Recognizer实例的主要目的是识别语音。 每个实例都具有用于识别来自音频源的语音的各种设置和功能。

Creating a Recognizer instance is easy. In your current interpreter session, just type:

创建Recognizer实例很容易。 在当前的解释器会话中,键入:

 >>> >>>  r r = = srsr .. RecognizerRecognizer ()
()

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

每个Recognizer实例都有七种使用各种AP​​I从音频源识别语音的方法。 这些是:

Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

在这七个中,只有recognize_sphinx()与CMU Sphinx引擎脱机工作。 其他六个都需要互联网连接。

A full discussion of the features and benefits of each API is beyond the scope of this tutorial. Since SpeechRecognition ships with a default API key for the Google Web Speech API, you can get started with it right away. For this reason, we’ll use the Web Speech API in this guide. The other six APIs all require authentication with either an API key or a username/password combination. For more information, consult the SpeechRecognition docs.

关于每种API的功能和优点的完整讨论超出了本教程的范围。 由于SpeechRecognition随附了Google Web Speech API的默认API密钥,因此您可以立即开始使用它。 因此,我们将在本指南中使用Web Speech API。 其他六个API都需要使用API​​密钥或用户名/密码组合进行身份验证。 有关更多信息,请参阅SpeechRecognition

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值