关于Goertzel-CSDN博客

2019独角兽企业重金招聘Python工程师标准>>>

双音频是电话系统中电话机与交换机之间的一种用户信令，通俗的讲, 就是两种不同的频率混音在一起的音频信号, 并代表某个数值.
双音频信号是贝尔实验室发明的，其目的是为了自动完成长途呼叫。
双音频的拨号键盘是4×4的矩阵，每一行代表一个低频，每一列代表一个高频。每按一个键就发送一个高频和低频的正弦信号组合，比如'1'相当于697和1209赫兹(Hz)。交换机可以解码这些频率组合並确定所对应的按键。

双音多频键盘

1209 Hz 1336 Hz 1477 Hz 1633 Hz
697 Hz 1 2 3 A
770 Hz 4 5 6 B
852 Hz 7 8 9 C
941 Hz * 0 # D

特別頻率

忙音 480 Hz 620 Hz
拔号提示音 350 Hz 440 Hz

国内忙音是450hz的音频信号, 350 毫秒的音频和350毫秒的静音不停的切换.
国内的拔号提示音也是450Hz的音频信号, 只不过是连续的不间断的音频信号.

像这种固定的频率信号的检测, 可以使用FFT将音频从时域转到频域, 然后分析频点即可确定信号内容. 但是, FFT是针对整体信号的整体频域的计算, FFT算法复杂, 执行效率低, 因而并不推荐.
Goertzel算法刚好能够高效的检测频率点, 并且算法简单, 执行效率高, 下面是Goertzel算法的描述:

s_prev = 0
s_prev2 = 0
normalized_frequency = target_frequency / sample_rate;
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
  s = x[n] + coeff*s_prev - s_prev2;
  s_prev2 = s_prev;
  s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev*s_prev2;

有了上面的算法描述, 不难写出代码的. 下面给出一段开源的关于Goertzel算法的c代码实现:

#define SAMPLING_RATE       8000
#define MAX_BINS            8
#define GOERTZEL_N          92

int         sample_count;
double      q1[ MAX_BINS ];
double      q2[ MAX_BINS ];
double      r[ MAX_BINS ];

/*
 * coef = 2.0 * cos( (2.0 * PI * k) / (float)GOERTZEL_N)) ;
 * Where k = (int) (0.5 + ((float)GOERTZEL_N * target_freq) / SAMPLING_RATE));
 *
 * More simply: coef = 2.0 * cos( (2.0 * PI * target_freq) / SAMPLING_RATE );
 */
double      freqs[ MAX_BINS] = 
{
  697,
  770,
  852,
  941,
  1209,
  1336,
  1477,
  1633
};

double      coefs[ MAX_BINS ] ;


/*----------------------------------------------------------------------------
 *  calc_coeffs
 *----------------------------------------------------------------------------
 * This is where we calculate the correct co-efficients.
 */
void calc_coeffs()
{
  int n;

  for(n = 0; n < MAX_BINS; n++)
  {
    coefs[n] = 2.0 * cos(2.0 * 3.141592654 * freqs[n] / SAMPLING_RATE);
  }
}


/*----------------------------------------------------------------------------
 *  post_testing
 *----------------------------------------------------------------------------
 * This is where we look at the bins and decide if we have a valid signal.
 */
void post_testing()
{
  int         row, col, see_digit;
  int         peak_count, max_index;
  double      maxval, t;
  int         i;
  char *  row_col_ascii_codes[4][4] = {
    {"1", "2", "3", "A"},
    {"4", "5", "6", "B"},
    {"7", "8", "9", "C"},
    {"*", "0", "#", "D"}};


  /* Find the largest in the row group. */
  row = 0;
  maxval = 0.0;
  for ( i=0; i<4; i++ )
  {
    if ( r[i] > maxval )
    {
      maxval = r[i];
      row = i;
    }
  }

  /* Find the largest in the column group. */
  col = 4;
  maxval = 0.0;
  for ( i=4; i<8; i++ )
  {
    if ( r[i] > maxval )
    {
      maxval = r[i];
      col = i;
    }
  }


  /* Check for minimum energy */

  if ( r[row] < 4.0e5 )   /* 2.0e5 ... 1.0e8 no change */
  {
    /* energy not high enough */
  }
  else if ( r[col] < 4.0e5 )
  {
    /* energy not high enough */
  }
  else
  {
    see_digit = TRUE;

    /* Twist check
     * CEPT => twist < 6dB
     * AT&T => forward twist < 4dB and reverse twist < 8dB
     *  -ndB < 10 log10( v1 / v2 ), where v1 < v2
     *  -4dB < 10 log10( v1 / v2 )
     *  -0.4  < log10( v1 / v2 )
     *  0.398 < v1 / v2
     *  0.398 * v2 < v1
     */
    if ( r[col] > r[row] )
    {
      /* Normal twist */
      max_index = col;
      if ( r[row] < (r[col] * 0.398) )    /* twist > 4dB, error */
        see_digit = FALSE;
    }
    else /* if ( r[row] > r[col] ) */
    {
      /* Reverse twist */
      max_index = row;
      if ( r[col] < (r[row] * 0.158) )    /* twist > 8db, error */
        see_digit = FALSE;
    }

    /* Signal to noise test
     * AT&T states that the noise must be 16dB down from the signal.
     * Here we count the number of signals above the threshold and
     * there ought to be only two.
     */
    if ( r[max_index] > 1.0e9 )
      t = r[max_index] * 0.158;
    else
      t = r[max_index] * 0.010;

    peak_count = 0;
    for ( i=0; i<8; i++ )
    {
      if ( r[i] > t )
        peak_count++;
    }
    if ( peak_count > 2 )
      see_digit = FALSE;

    if ( see_digit )
    {
      printf( "%s", row_col_ascii_codes[row][col-4] );
      fflush(stdout);
    }
  }
}


/*----------------------------------------------------------------------------
 *  goertzel
 *----------------------------------------------------------------------------
 */
void goertzel( int sample )
{
  double      q0;
  ui32        i;

  sample_count++;
  for ( i=0; i<MAX_BINS; i++ )
  {
    q0 = coefs[i] * q1[i] - q2[i] + sample;
    q2[i] = q1[i];
    q1[i] = q0;
  }

  if (sample_count == GOERTZEL_N)
  {
    for ( i=0; i<MAX_BINS; i++ )
    {
      r[i] = (q1[i] * q1[i]) + (q2[i] * q2[i]) - (coefs[i] * q1[i] * q2[i]);
      q1[i] = 0.0;
      q2[i] = 0.0;
    }
    post_testing();
    sample_count = 0;
  }
}

到这为止, 你只需要用麦克风把拔电话号码发出的声音录下来, 然后用这段代码一跑, 按了什么号码便立即可以得到了. Goertzel算法同样也可以检测忙音和拔号音.

Goertzel算法还可以用于解码FSK信号.

FSK也称为频移键控, 通俗的讲, 就是两种AB不同的频率组成的信号, 其中令A频率为1令B频率为0, 从而实现传输2进制信息. FSK在传统电话机中也是使用非常广泛.
像FSK这种已知固定的两种频率, 显然比检测DTMF还要简单, FSK只需要检测2个频点, 即可检测出信号内容, 而DTMF需要检测8个频点.

Goertzel算法关于N的取值
N的值需要一定的经验和根据实际情况需要来确定.
影响N取值一般有2个: 1. 采样率, 2. 信号长度.
总之N取值太小可能导致出现重码, 太大会导致检测结果不正确, 我通常是取信号时长的5/4来做N, 但这不一定, 需要多做试验来确定N值.

固定频率检测的另类算法
我曾经还使用过神经网络来检测双音频和FSK, 效果也不错, 运算量略大于 Goertzel, 准确率和 Goertzel 相当, 难点在于训练网络, 所以只是作学习而尝试.