python 语音识别_带有Python的AI –语音识别

python 语音识别 带有Python的AI –语音识别 (AI with Python – Speech Recognition)Advertisements 广告 Previous Page 上一页 Next Page 下一页 In this chapter, we will learn about speech recognition using AI ...
python 语音识别

python 语音识别

带有Python的AI –语音识别 (AI with Python – Speech Recognition)

In this chapter, we will learn about speech recognition using AI with Python.


Speech is the most basic means of adult human communication. The basic goal of speech processing is to provide an interaction between a human and a machine.

语音是成人交流的最基本手段。 语音处理的基本目标是提供人与机器之间的交互。

Speech processing system has mainly three tasks −


  • First, speech recognition that allows the machine to catch the words, phrases and sentences we speak

    首先 ,语音识别使机器可以捕捉我们说的单词,短语和句子

  • Second, natural language processing to allow the machine to understand what we speak, and

    其次 ,自然语言处理使机器能够理解我们所说的内容,并且

  • Third, speech synthesis to allow the machine to speak.

    第三 ,语音合成让机器说话。

This chapter focuses on speech recognition, the process of understanding the words that are spoken by human beings. Remember that the speech signals are captured with the help of a microphone and then it has to be understood by the system.

本章重点介绍语音识别 ,即理解人类所说单词的过程。 请记住,语音信号是在麦克风的帮助下捕获的,然后系统必须理解它。

构建语音识别器 (Building a Speech Recognizer)

Speech Recognition or Automatic Speech Recognition (ASR) is the center of attention for AI projects like robotics. Without ASR, it is not possible to imagine a cognitive robot interacting with a human. However, it is not quite easy to build a speech recognizer.

语音识别或自动语音识别(ASR)是诸如机器人技术之类的AI项目的关注重点。 没有ASR,就无法想象认知机器人会与人互动。 但是,构建语音识别器并不是一件容易的事。

开发语音识别系统的困难 (Difficulties in developing a speech recognition system)

Developing a high quality speech recognition system is really a difficult problem. The difficulty of speech recognition technology can be broadly characterized along a number of dimensions as discussed below −

开发高质量的语音识别系统确实是一个难题。 语音识别技术的难点可以从多个方面大致表征,如下所述-

  • Size of the vocabulary − Size of the vocabulary impacts the ease of developing an ASR. Consider the following sizes of vocabulary for a better understanding.

    词汇量 -词汇量影响开发ASR的难易程度。 为了更好地理解,请考虑以下词汇量。

    • A small size vocabulary consists of 2-100 words, for example, as in a voice-menu system


    • A medium size vocabulary consists of several 100s to 1,000s of words, for example, as in a database-retrieval task


    • A large size vocabulary consists of several 10,000s of words, as in a general dictation task.


    Note that, the larger the size of vocabulary, the harder it is to perform recognition.


  • Channel characteristics − Channel quality is also an important dimension. For example, human speech contains high bandwidth with full frequency range, while a telephone speech consists of low bandwidth with limited frequency range. Note that it is harder in the latter.

    频道特性 -频道质量也是一个重要的维度。 例如,人类语音包含具有整个频率范围的高带宽,而电话语音包含具有有限频率范围的低带宽。 请注意,后者更难。

  • Speaking mode − Ease of developing an ASR also depends on the speaking mode, that is whether the speech is in isolated word mode, or connected word mode, or in a continuous speech mode. Note that a continuous speech is harder to recognize.

    口语模式 -开发ASR的难易程度还取决于口语模式,即语音是处于隔离单词模式,连接单词模式还是连续语音模式。 请注意,连续语音很难识别。

  • Speaking style − A read speech may be in a formal style, or spontaneous and conversational with casual style. The latter is harder to recognize.

    演讲风格 -阅读的演讲可以是正式的风格,也可以是自发的,也可以随意交谈。 后者更难辨认。

  • Speaker dependency − Speech can be speaker dependent, speaker adaptive, or speaker independent. A speaker independent is the hardest to build.

    说话者依赖性 -语音可以是说话者依赖性,说话者自适应性或说话者无关性。 独立于演讲者是最难建立的。

  • Type of noise − Noise is another factor to consider while developing an ASR. Signal to noise ratio may be in various ranges, depending on the acoustic environment that observes less versus more background noise −

  • 0
  • 6
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


