数据压缩作业八：MPEG音频编码

最新推荐文章于 2024-07-18 17:31:30 发布

你没事吧.

最新推荐文章于 2024-07-18 17:31:30 发布

阅读量182

点赞数

文章标签：音视频

本文链接：https://blog.csdn.net/m0_49007394/article/details/126085488

版权

MPEG概述

MPEG标准主要有以下五个，MPEG-1、MPEG-2、MPEG-4、MPEG-7及MPEG-21等。该专家组建于1988年，专门负责为CD建立视频和音频标准，而成员都是视频、音频及系统领域的技术专家。他们成功将声音和影像的记录脱离了传统的模拟方式，建立了ISO/IEC11172压缩编码标准，并制定出MPEG-格式，令视听传播方面进入了数字化时代。因此，大家现在泛指的MPEG-X版本，就是由ISO (InternationalOrganization for Standardization) 所制定而发布的视频、音频、数据的压缩标准。

MPEG-1声音的主要性能

输入为PCM信号，采样率为32、44.1或48kHz，输出为32kbps~384kbps，具有三个独立的压缩层次：layerⅠ（编码器最简单）、layerⅡ（编码器复杂度中等）、layerⅢ（编码器最复杂）。

人耳听觉特性

人耳听觉系统大致等效于一个信号通过一组并联的不同中心频率的带通滤波器，中心频率与信号频率相同的滤波器具有最大响应，中心频率偏离信号频率较多的滤波器不会产生响应。在0Hz到20kHz频率范围内由25个重叠的带通滤波器组成滤波器组。

听觉系统中存在一个听觉阈值电平，低于这个电平的声音信号就听不到。

听觉阈值电平是自适应的，也就是说听觉阈值的大小随声音频率的改变而改变。

一个人能否听到这种声音取决于声音的频率以及声音的幅度是否高于这种频率下的听觉阈值

掩蔽效应

一个较弱的声音的听觉感受被另一个较强的声音影响的现象称为人耳的听觉掩蔽效应。掩蔽作用与信号频率和强度有关。掩蔽效应在一定频率范围内不随带宽增大而改变，直至超过某个频率值。

临界频带

临界频带是指当某个纯音被以它为中心频率，且具有一定带宽的连续噪声所掩蔽时，如果该纯音刚好被听到时的功率等于这一频带内的噪声功率，这个带宽为临界频带宽度

通常认为从20Hz到16kHz有25个临界频带，单位为bark，1bark=一个临界频带的宽度

MPEG编码原理

两条线：

一是，PCM码流经多相滤波器组变换为32个子带的频域信号，在实际中可以理解为每32个样点做一次子带分解，连续做12次，此时得到的每个子带中都有12个样点，这样的过程再连续做3次，所以最终每个子带上会有12 × 3 = 36 12\times 3=3612×3=36个样本点。之后利用第二条线的结果对数据进行量化。

二是，对PCM信号进行FFT，输入到心理声学模型中，同时，第一条线中子带分解后，根据各个子带的样本点计算提取相应子带的比例因子，也将其输入到心理声学模型中，由心理声学模型计算以频率为自变量的噪声掩蔽阈值，通过信号掩蔽比SMR确定比例因子选择信息，另外，与目标码率结合确定动态比特分配，决定给子带分配多少量化比特数。

将量化后的子带样本和边信息编码数据以及辅助数据按照规定的帧格式组装成帧比特流输出。

int main (int argc, char **argv)
{
  typedef double SBS[2][3][SCALE_BLOCK][SBLIMIT];
  SBS *sb_sample;
  typedef double JSBS[3][SCALE_BLOCK][SBLIMIT];
  JSBS *j_sample;
  typedef double IN[2][HAN_SIZE];
  IN *win_que;
  typedef unsigned int SUB[2][3][SCALE_BLOCK][SBLIMIT];
  SUB *subband;
 
  frame_info frame;								//头信息、比特分配表、声道数、子带数等信息
  frame_header header;							//头信息的内容
  char original_file_name[MAX_NAME_SIZE];		//输入文件名
  char encoded_file_name[MAX_NAME_SIZE];		//输出文件名
  short **win_buf;
  static short buffer[2][1152];
  static unsigned int bit_alloc[2][SBLIMIT], scfsi[2][SBLIMIT];
  static unsigned int scalar[2][3][SBLIMIT], j_scale[3][SBLIMIT];
  static double smr[2][SBLIMIT], lgmin[2][SBLIMIT], max_sc[2][SBLIMIT];
  // FLOAT snr32[32];
  short sam[2][1344];		/* was [1056]; */
  int model, nch, error_protection;
  static unsigned int crc;
  int sb, ch, adb;
  unsigned long frameBits, sentBits = 0;
  unsigned long num_samples;
  int lg_frame;
  int i;
 
  /* Used to keep the SNR values for the fast/quick psy models */
  static FLOAT smrdef[2][32];					//各个子带
 
  static int psycount = 0;
  extern int minimum;
 
  time_t start_time, end_time;
  int total_time;
 
  sb_sample = (SBS *) mem_alloc (sizeof (SBS), "sb_sample");
  j_sample = (JSBS *) mem_alloc (sizeof (JSBS), "j_sample");
  win_que = (IN *) mem_alloc (sizeof (IN), "Win_que");
  subband = (SUB *) mem_alloc (sizeof (SUB), "subband");
  win_buf = (short **) mem_alloc (sizeof (short *) * 2, "win_buf");
 
  /* clear buffers */
  memset ((char *) buffer, 0, sizeof (buffer));
  memset ((char *) bit_alloc, 0, sizeof (bit_alloc));
  memset ((char *) scalar, 0, sizeof (scalar));
  memset ((char *) j_scale, 0, sizeof (j_scale));
  memset ((char *) scfsi, 0, sizeof (scfsi));
  memset ((char *) smr, 0, sizeof (smr));
  memset ((char *) lgmin, 0, sizeof (lgmin));
  memset ((char *) max_sc, 0, sizeof (max_sc));
  //memset ((char *) snr32, 0, sizeof (snr32));
  memset ((char *) sam, 0, sizeof (sam));
 
  global_init ();									//初始化
  
  header.extension = 0;
  frame.header = &header;
  frame.tab_num = -1;		/* no table loaded */
  frame.alloc = NULL;
  header.version = MPEG_AUDIO_ID;	/* Default: MPEG-1 */
 
  total_time = 0;
 
  time(&start_time);     
 
  programName = argv[0];
  if (argc == 1)		/* no command-line args */
    short_usage ();
  else
    parse_args (argc, argv, &frame, &model, &num_samples, original_file_name,
		encoded_file_name);
  print_config (&frame, &model, original_file_name, encoded_file_name);	//输出配置信息到窗口中
 
  /* this will load the alloc tables and do some other stuff */
  hdr_to_frps (&frame);					//根据头信息来设定其他信息
  nch = frame.nch;
  error_protection = header.error_protection;
 
 
 
  while (get_audio (musicin, buffer, num_samples, nch, &header) > 0) {//获取音频信息
    if (glopts.verbosity > 1)
      if (++frameNum % 10 == 0)
	fprintf (stderr, "[%4u]\r", frameNum);
    fflush (stderr);
    win_buf[0] = &buffer[0][0];
    win_buf[1] = &buffer[1][0];
 
    adb = available_bits (&header, &glopts);		//计算可用比特数
    lg_frame = adb / 8;
    if (header.dab_extension) {
      /* in 24 kHz we always have 4 bytes */
      if (header.sampling_frequency == 1)
	header.dab_extension = 4;
/* You must have one frame in memory if you are in DAB mode                 */
/* in conformity of the norme ETS 300 401 http://www.etsi.org               */
      /* see bitstream.c            */
      if (frameNum == 1)
	minimum = lg_frame + MINIMUM;
      adb -= header.dab_extension * 8 + header.dab_length * 8 + 16;
    }
 
    {
      int gr, bl, ch;
      /* New polyphase filter
	 Combines windowing and filtering. Ricardo Feb'03 */
      for( gr = 0; gr < 3; gr++ )					//每12个样点一组
	for ( bl = 0; bl < 12; bl++ )					//每组12个
	  for ( ch = 0; ch < nch; ch++ )				//声道数次
	    WindowFilterSubband( &buffer[ch][gr * 12 * 32 + 32 * bl], ch,
				 &(*sb_sample)[ch][gr][bl][0] );	//多相滤波器组
    }
 
#ifdef REFERENCECODE
    {
      /* Old code. left here for reference */
      int gr, bl, ch;
      for (gr = 0; gr < 3; gr++)
	for (bl = 0; bl < SCALE_BLOCK; bl++)
	  for (ch = 0; ch < nch; ch++) {
	    window_subband (&win_buf[ch], &(*win_que)[ch][0], ch);
	    filter_subband (&(*win_que)[ch][0], &(*sb_sample)[ch][gr][bl][0]);
	  }
    }
#endif
 
 
#ifdef NEWENCODE
    scalefactor_calc_new(*sb_sample, scalar, nch, frame.sblimit);
    find_sf_max (scalar, &frame, max_sc);
    if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
      /* this way we calculate more mono than we need */
      /* but it is cheap */
      combine_LR_new (*sb_sample, *j_sample, frame.sblimit);
      scalefactor_calc_new (j_sample, &j_scale, 1, frame.sblimit);
    }
#else
    scale_factor_calc (*sb_sample, scalar, nch, frame.sblimit);
    pick_scale (scalar, &frame, max_sc);
    if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
      /* this way we calculate more mono than we need */
      /* but it is cheap */
      combine_LR (*sb_sample, *j_sample, frame.sblimit);
      scale_factor_calc (j_sample, &j_scale, 1, frame.sblimit);
    }
#endif
 
 
	//选择合适的心理声学模型
    if ((glopts.quickmode == TRUE) && (++psycount % glopts.quickcount != 0)) {
      /* We're using quick mode, so we're only calculating the model every
         'quickcount' frames. Otherwise, just copy the old ones across */
      for (ch = 0; ch < nch; ch++) {
	for (sb = 0; sb < SBLIMIT; sb++)
	  smr[ch][sb] = smrdef[ch][sb];
      }
    } else {
      /* calculate the psymodel */
      switch (model) {
      case -1:
	psycho_n1 (smr, nch);
	break;
      case 0:	/* Psy Model A */
	psycho_0 (smr, nch, scalar, (FLOAT) s_freq[header.version][header.sampling_frequency] * 1000);	
	break;
      case 1:
	psycho_1 (buffer, max_sc, smr, &frame);
	break;
      case 2:
	for (ch = 0; ch < nch; ch++) {
	  psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
		     (FLOAT) s_freq[header.version][header.sampling_frequency] *
		     1000, &glopts);
	}
	break;
      case 3:
	/* Modified psy model 1 */
	psycho_3 (buffer, max_sc, smr, &frame, &glopts);
	break;
      case 4:
	/* Modified Psycho Model 2 */
	for (ch = 0; ch < nch; ch++) {
	  psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
		     (FLOAT) s_freq[header.version][header.sampling_frequency] *
		     1000, &glopts);
	}
	break;	
      case 5:
	/* Model 5 comparse model 1 and 3 */
	psycho_1 (buffer, max_sc, smr, &frame);
	fprintf(stdout,"1 ");
	smr_dump(smr,nch);
	psycho_3 (buffer, max_sc, smr, &frame, &glopts);
	fprintf(stdout,"3 ");
	smr_dump(smr,nch);
	break;
      case 6:
	/* Model 6 compares model 2 and 4 */
	for (ch = 0; ch < nch; ch++) 
	  psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
		    (FLOAT) s_freq[header.version][header.sampling_frequency] *
		    1000, &glopts);
	fprintf(stdout,"2 ");
	smr_dump(smr,nch);
	for (ch = 0; ch < nch; ch++) 
	  psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
		     (FLOAT) s_freq[header.version][header.sampling_frequency] *
		     1000, &glopts);
	fprintf(stdout,"4 ");
	smr_dump(smr,nch);
	break;
      case 7:
	fprintf(stdout,"Frame: %i\n",frameNum);
	/* Dump the SMRs for all models */	
	psycho_1 (buffer, max_sc, smr, &frame);
	fprintf(stdout,"1");
	smr_dump(smr, nch);
	psycho_3 (buffer, max_sc, smr, &frame, &glopts);
	fprintf(stdout,"3");
	smr_dump(smr,nch);
	for (ch = 0; ch < nch; ch++) 
	  psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
		    (FLOAT) s_freq[header.version][header.sampling_frequency] *
		    1000, &glopts);
	fprintf(stdout,"2");
	smr_dump(smr,nch);
	for (ch = 0; ch < nch; ch++) 
	  psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
		     (FLOAT) s_freq[header.version][header.sampling_frequency] *
		     1000, &glopts);
	fprintf(stdout,"4");
	smr_dump(smr,nch);
	break;
      case 8:
	/* Compare 0 and 4 */	
	psycho_n1 (smr, nch);
	fprintf(stdout,"0");
	smr_dump(smr,nch);
 
	for (ch = 0; ch < nch; ch++) 
	  psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
		     (FLOAT) s_freq[header.version][header.sampling_frequency] *
		     1000, &glopts);
	fprintf(stdout,"4");
	smr_dump(smr,nch);
	break;
      default:
	fprintf (stderr, "Invalid psy model specification: %i\n", model);
	exit (0);
      }
 
      if (glopts.quickmode == TRUE)
	/* copy the smr values and reuse them later */
	for (ch = 0; ch < nch; ch++) {
	  for (sb = 0; sb < SBLIMIT; sb++)
	    smrdef[ch][sb] = smr[ch][sb];
	}
 
      if (glopts.verbosity > 4) 
	smr_dump(smr, nch);
     
      
 
 
    }
 
#ifdef NEWENCODE
    sf_transmission_pattern (scalar, scfsi, &frame);
    main_bit_allocation_new (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
    //main_bit_allocation (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
 
    if (error_protection)
      CRC_calc (&frame, bit_alloc, scfsi, &crc);
 
    write_header (&frame, &bs);
    //encode_info (&frame, &bs);
    if (error_protection)
      putbits (&bs, crc, 16);
    write_bit_alloc (bit_alloc, &frame, &bs);
    //encode_bit_alloc (bit_alloc, &frame, &bs);
    write_scalefactors(bit_alloc, scfsi, scalar, &frame, &bs);
    //encode_scale (bit_alloc, scfsi, scalar, &frame, &bs);
    subband_quantization_new (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
    			  *subband, &frame);
    //subband_quantization (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
    //	  *subband, &frame);
    write_samples_new(*subband, bit_alloc, &frame, &bs);
    //sample_encoding (*subband, bit_alloc, &frame, &bs);
#else
    transmission_pattern (scalar, scfsi, &frame);
    main_bit_allocation (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
    if (error_protection)
      CRC_calc (&frame, bit_alloc, scfsi, &crc);
    encode_info (&frame, &bs);
    if (error_protection)
      encode_CRC (crc, &bs);
    encode_bit_alloc (bit_alloc, &frame, &bs);
    encode_scale (bit_alloc, scfsi, scalar, &frame, &bs);
    subband_quantization (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
			  *subband, &frame);
    sample_encoding (*subband, bit_alloc, &frame, &bs);
#endif
 
 
    /* If not all the bits were used, write out a stack of zeros */
    for (i = 0; i < adb; i++)
      put1bit (&bs, 0);
    if (header.dab_extension) {
      /* Reserve some bytes for X-PAD in DAB mode */
      putbits (&bs, 0, header.dab_length * 8);
      
      for (i = header.dab_extension - 1; i >= 0; i--) {
	CRC_calcDAB (&frame, bit_alloc, scfsi, scalar, &crc, i);
	/* this crc is for the previous frame in DAB mode  */
	if (bs.buf_byte_idx + lg_frame < bs.buf_size)
	  bs.buf[bs.buf_byte_idx + lg_frame] = crc;
	/* reserved 2 bytes for F-PAD in DAB mode  */
	putbits (&bs, crc, 8);
      }
      putbits (&bs, 0, 16);
    }
 
    frameBits = sstell (&bs) - sentBits;
 
    if (frameBits % 8) {	/* a program failure */
      fprintf (stderr, "Sent %ld bits = %ld slots plus %ld\n", frameBits,
	       frameBits / 8, frameBits % 8);
      fprintf (stderr, "If you are reading this, the program is broken\n");
      fprintf (stderr, "email [mfc at NOTplanckenerg.com] without the NOT\n");
      fprintf (stderr, "with the command line arguments and other info\n");
      exit (0);
    }
 
    sentBits += frameBits;
  }
 
  close_bit_stream_w (&bs);
 
  if ((glopts.verbosity > 1) && (glopts.vbr == TRUE)) {
    int i;
#ifdef NEWENCODE
    extern int vbrstats_new[15];
#else
    extern int vbrstats[15];
#endif
    fprintf (stdout, "VBR stats:\n");
    for (i = 1; i < 15; i++)
      fprintf (stdout, "%4i ", bitrate[header.version][i]);
    fprintf (stdout, "\n");
    for (i = 1; i < 15; i++)
#ifdef NEWENCODE
      fprintf (stdout,"%4i ",vbrstats_new[i]);
#else
      fprintf (stdout, "%4i ", vbrstats[i]);
#endif
    fprintf (stdout, "\n");
  }
 
  fprintf (stderr,
	   "Avg slots/frame = %.3f; b/smp = %.2f; bitrate = %.3f kbps\n",
	   (FLOAT) sentBits / (frameNum * 8),
	   (FLOAT) sentBits / (frameNum * 1152),
	   (FLOAT) sentBits / (frameNum * 1152) *
	   s_freq[header.version][header.sampling_frequency]);
 
  if (fclose (musicin) != 0) {
    fprintf (stderr, "Could not close \"%s\".\n", original_file_name);
    exit (2);
  }
 
  fprintf (stderr, "\nDone\n");
 
  time(&end_time);
  total_time = end_time - start_time;
  printf("total time is %d\n", total_time);
  
  exit (0);
}

实验结果

采样率：44.100000 khz
目标码率：192 kbps
第50帧
所分配比特数：5008
比例因子：
声道0:
子带 0: 10 11 11
子带 1: 18 19 23
子带 2: 17 17 21
子带 3: 22 25 27
子带 4: 30 31 33
子带 5: 24 25 29
子带 6: 22 27 30
子带 7: 19 23 26
子带 8: 42 43 45
子带 9: 29 31 35
子带10: 29 30 31
子带11: 29 29 29
子带12: 21 24 28
子带13: 24 23 27
子带14: 24 26 28
子带15: 23 23 31
子带16: 30 32 33
子带17: 28 29 32
子带18: 26 26 30
子带19: 29 31 35
子带20: 31 32 35
子带21: 29 31 36
子带22: 40 42 44
子带23: 52 54 49
子带24: 52 55 53
子带25: 52 53 55
子带26: 53 52 51
子带27: 56 52 55
子带28: 53 55 53
子带29: 54 55 51
比特分配表：
声道0:
子带 0: 9
子带 1: 8
子带 2: 8
子带 3: 8
子带 4: 6
子带 5: 7
子带 6: 7
子带 7: 7
子带 8: 3
子带 9: 6
子带10: 6
子带11: 6
子带12: 7
子带13: 6
子带14: 6
子带15: 6
子带16: 5
子带17: 5
子带18: 6
子带19: 4
子带20: 4
子带21: 4
子带22: 0
子带23: 0
子带24: 0
子带25: 0
子带26: 0
子带27: 0
子带28: 0
子带29: 0