近段时间由于项目需求,重新复习了一下语音及音频基础知识。现结合自身对语音相关知识的理解,整理总结如下:
1)声音在空气中传播振动,这种振动会造成声波周围的空气压强变化,而这种压强变化被我们的耳朵感知,于是我们就听到了声音。这种压力变化的大小和音量成正比例关系,即响度,又称声强或音量。它表示的是声音能量的强弱程度,主要取决于声波振幅的大小。声音的响度一般用声压(达因/平方厘米)或声强(瓦特/平方厘米)来计量,声压的单位为帕(Pa),它与基准声压比值的对数值称为声压级,单位是分贝(dB)。
2)振动频率,是感知不同音高的原因。声波频率越大,音高越高。其中基音(fundamental frequency or Pitch or F0)则做为音高的物理体现。音高的变化与两个频率相对变化的对数成正比。不管原来频率多少,只要两个40dB的纯音频率都增加1个倍频程,人耳感受到的音高变化则相同。在语音中,声带开启/闭合一次为基音的一个周期取,其决于声带尺寸、特性、张力。
注:It is the frequency at which vocal chords vibrate in voiced sounds, main acoustic correlate of tone and intonation. This frequency can be identified in the sound produced, which presents quasi-periodicity, the pitch period being the fundamental period of the signal (the inverse of the fundamental frequency). Pitch is more often used to refer to how the fundamental frequency is perceived。
3)共振峰(formants,resonance peak),音色的物理体现,一般用F_n来表示。它是声音经过共振腔时手腔体滤波作用不同频率段的能量重新分配,区别声母的重要参数(property of vocal tract, independent of source signal)。同一乐器,不论基频如何,会表现出相同的共振峰显著增。语音中一般用前三个共振峰可表示一个元音,而对于复杂的鼻音或者辅音,则需要五个以上的共振峰来表示。
共振峰计算公式为F_n=C(2n-1)/4L,假设人的声道长度(Vocal tract length)L为17cm,c为音速,其共振频率为F_1=500Hz,F_2=1500Hz4)音色(tone)音色(timbre),又名音品,,distinguish sound which have same pitch and loudness,音色的不同取决于不同的泛音(overtone)。每一种乐器、不同的人以及所有能发声的物体发出的声音,除了一个基音外,还有许多不同频率的泛音伴随,正是这些泛音决定了其不同的音色,使人能辨别出是不同的乐器甚至不同的人发出的声音。物理上由声音波形的谐波频谱和包络决定,mainly determined by harmonic content and dynamic characteristic。
注:在音乐范围,国际标准音高对于A4音的频率定为440Hz。泛音就是比基准频率高一个八度(octave)或者多个八度的音。音高每高一个八度,声频会增加一倍。而半音 (semitone),是相邻的两个音之间最小的距离,计算公式为 69+12*log2(Freq/440)。例如C D E F G A B 这一组音的距离分成十二个等分,每一个等分叫一个“半音”。两个音之间的距离有两个“半音”的 ,就叫“全音” 。在钢琴、电子琴等键盘乐器上,C-D,D-E,F-G,G-A,A-B,两音之间隔着一个黑键,它们之间的距离就是全音;E-F,B-C,两音之间没有黑键相隔,它们之间的距离就是半音。
另外音符(Note), represent the pitch and duration of a tone. The term octave is used to indicate the range of two notes having a frequency ratio of any power of two. For example, if one note has a frequency of 400 Hz, the note one octave above is at 800 Hz, and the note one octave below is at 200 Hz. Human ears tends to hear notes with ’octave’ relations as being very similar, due to the closely related harmonics. For this reason, all notes with one or multiple octaves apart are grouped in the same pitch class.
Traditionally pitch class is denoted through the use of the first seven Latin letters, i.e. A, B, C, D, E, F and G. Letter names are suffixed by the accidental, sharp (]) and flat ([). Sharp raises a note by one semitone, and flat lowers a note by one semitone. For instance, note C] is one semitone higher than note C, and note B[ is one semitone lower than note B. A note also can be classified with regard to its duration. In order of duration, it generally has: whole note, half note, quarter note. A whole note has timelength of four beats in 4/4 time. A half note has half the duration of a whole note and twice the duration of a quarter note. Thus, a quarter note has time length of one beat. It is also helpful to point out that the onset of a note means the start instant, and the offset means the end instant of a note.
A music scale (半音音阶) is a sequence of notes in ascending or descending order. Most scales are octave-repeating, which means their pattern of notes is same in
every octave. For instance, the C major scale is C-D-E-F-G-A-B-C, where the last note C is one octave higher than the first note C. One widely used scale is chromatic scale, with twelve notes, and each a semitone apart. Chromatic scale corresponds to the white-black piano keyboard, i.e. C-C]-D-E[-E-F-F]-G-G]-A-B[-B-C. Again the last note C is one octave above the first note C.
Diverse music notations systems are used to denote notes. One formal way named ’scientific pitch notation’ (科学音调记号法) is extensively applied in the music world.
It suffixes a note of chromatic scale with a number to denote octave. For example, C3 means the note C one octave below C4 (the middle C). Technically, each note corresponds to a fundamental frequency F0. These frequencies are defined around the central note, A4 (440 Hz). Assume the semitone distance of a note away from A4 is s. If the note is above A4, s is positive; otherwise s is negative.
5)中文语言称为tone language,即声调语言。而不用声调作为手段来表示“词汇意义和词的语法意义”的语言,被称为“非声调语言”,如英语。
6)语调(intonation),主要是expressive of meaning e.g. anger, wariness (falling intonation,rising intonation)
7)重音(stress), mainly include word syllable lexical level
8)声调(tone) in linguistics,use pitch to distinguish lexical or grammatical meaning.
9)语谱图(Spectrogram,spectrum),是语音频谱图。语谱图有两类:a)narrow band <---> sample window size large ==谐波特征明显,频率分辨率高
b)wide band <---> sample window size small == 一般用于语谱图,共振峰位置好找,谐波结构不易找
10)音节(syllable),元音音素是音节主体,辅音是分界线,元音可单独构成音节,也可以与辅音一起构成。两元音一起只发一个元音,只算一个音节,两元音之间只有一个辅音时,辅音归后一个音节,若有两个时前后各归一个
11)音位(phoneme),具有区别意义的最小语音单位,按语音的辨义作用分类 ,人类语言抽象的最小单位。音位属于具体一个语言,没有超语言的音位
12)音素(phone),构成音节的最小单位或最小语音片断。从音色的角度分类,物理上具体声音用国际国际音标来标记
13)同位异音(allophone),若发音相似,且处于互补关系,几个音质不同的音素可以归为同一个音位
同一语言系统中,音位和音素可以理解为类型和成员的关系。音位是一个语言类型,音素是语音类型成员,称为音位变体,通常选择典型常用音素作为音位的代表名称。音位有元音、辅音音位,还有声调音位作为“非音质音位”或“超音音位”。在英语中有48个音位,元音20个辅音28个。音素和音位变体是写在中括号中如[p:] [ph] ;音位用双斜线标记/p/ /b/