ffmpeg的tutorial中文版学习笔记(三)

最新推荐文章于 2024-03-16 09:37:56 发布

郑亚帅

最新推荐文章于 2024-03-16 09:37:56 发布

阅读量569

点赞数

分类专栏： ffmpeg

ffmpeg 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

fmpeg 教程3：播放声音
源代码：tutorial03-1.c

现在我们要来播放声音。SDL 也为我们准备了输出声音的方法。函数SDL_OpenAudio()本身就是用来打开声音设备的。它使用一个叫做SDL_AudioSpec 结构体作为参数，这个结构体中包含了我们将要输出的音频的所有信息。

在我们展示如何建立之前，让我们先解释一下电脑是如何处理音频的。数字音频是由一长串的样本流组成的。每个样本表示声音波形中的一个值。声音按照一个特定的采样率来进行录制，采样率表示以多快的速度来播放这段样本流，它的表示方式为每秒多少次采样。例如 22050和44100的采样率就是电台和CD 常用的采样率。此外，大多音频有不只一个通道来

表示立体声或者环绕。例如，如果采样是立体声，那么每次的采样数就为2个。当我们从一个电影文件中等到数据的时候，我们不知道我们将得到多少个样本，但是ffmpeg 将不会给我们部分的样本 ―― 这意味着它将不会把立体声分割开来。

SDL 播放声音的方式是这样的：你先设置声音的选项：采样率（在SDL 的结构体中被叫做 freq 的表示频率frequency），声音通道数和其它的参数，然后我们设置一个回调函数和一些用户数据userdata。当开始播放音频的时候，SDL 将不断地调用这个回调函数并且要求它来向声音缓冲填入一个特定的数量的字节。当我们把这些信息放到SDL_AudioSpec 结构体中后，我们调用函数SDL_OpenAudio()就会打开声音设备并且给我们送回另外一个 AudioSpec 结构体。这个结构体是我们实际上用到的－－因为我们不能保证得到我们所要求的。

设置音频

目前先把讲的记住，因为我们实际上还没有任何关于声音流的信息。让我们回过头来看一下我们的代码，看我们是如何找到视频流的，同样我们也可以找到声音流。

[cpp]view plaincopy 
    
print?
 // Find the first video stream  
 videoStream=-1;  
 audioStream=-1;  
 for(i=0; i<pFormatCtx->nb_streams; i++)  
 {  
   if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO &&videoStream < 0)  
   {  
     videoStream=i;  
   }  
   if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO &&audioStream < 0)  
   {  
     audioStream=i;  
   }  
 }  
 if(videoStream==-1)  
   return -1; // Didn't find a video stream  
 if(audioStream==-1)  
   return -1;  

从这里我们可以从描述流的AVCodecContext 中得到我们想要的信息，就像我们得到视频流的信息一样。

[cpp]view plaincopy 
   
print?
 AVCodecContext *aCodecCtx = NULL;  
 aCodecCtx=pFormatCtx->streams[audioStream]->codec;  

包含在编解码上下文中的所有信息正是我们所需要的用来建立音频的信息：

[cpp]view plaincopy 
    
print?
 SDL_AudioSpec wanted_spec, spec;  
 // Set audio settings from codec info  
 wanted_spec.freq = aCodecCtx->sample_rate;  
 wanted_spec.format = AUDIO_S16SYS;  
 wanted_spec.channels = aCodecCtx->channels;  
 wanted_spec.silence = 0;  
 wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;  
 wanted_spec.callback = audio_callback;  
 wanted_spec.userdata = aCodecCtx;  
 if(SDL_OpenAudio(&wanted_spec, &spec) < 0)  
 {  
     fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());  
     return -1;  
 }  

让我们浏览一下这些结构体成员：

· freq 前面所讲的采样率

· format 告诉SDL 我们将要给的格式。在 “ S16SYS ” 中的S 表示有符号的signed，16表示每个样本是16位长的，SYS 表示大小头的顺序是与使用的系统相同的。这些格式是由 avcodec_decode_audio2为我们给出来的输入音频的格式。

· channels 声音的通道数

· silence 这是用来表示静音的值。因为声音采样是有符号的，所以0当然就是这个值。

· samples 这是当我们想要更多声音的时候，我们想让SDL 给出来的声音缓冲区的尺寸。一个比较合适的值在512到8192之间； ffplay 使用1024 。

· callback 这个是我们的回调函数。我们后面将会详细讨论。

· userdata 这个是SDL 供给回调函数运行的参数。我们将让回调函数得到整个编解码的上下文；你将在后面知道原因。

最后，我们使用SDL_OpenAudio 函数来打开声音。

如果你还记得前面的指导，我们仍然需要打开声音编解码器本身。这是很显然的。

[cpp]view plaincopy 
    
print?
 AVCodec *aCodec = NULL;  
 AVDictionary *audioOptionsDict = NULL;  
 aCodec = avcodec_find_decoder(aCodecCtx->codec_id);  
 if(!aCodec)  
 {  
     fprintf(stderr, "Unsupported codec!\n");  
     return -1;  
 }  
 avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);  

队列

嗯！现在我们已经准备好从流中取出声音信息。但是我们如何来处理这些信息呢？我们将会不断地从文件中得到这些包，但同时SDL 也将调用回调函数。解决方法为创建一个全局的结构体变量以便于我们从文件中得到的声音包有地方存放同时也保证SDL 中的声音回调函数audio_callback 能从这个地方得到声音数据。所以我们要做的是创建一个包的队列 queue。在ffmpeg 中有一个叫AVPacketList 的结构体可以帮助我们，这个结构体实际是一串包的链表。下面就是我们的队列结构体：

[cpp]view plaincopy 
     
print?
 typedef struct PacketQueue  
 {  
     AVPacketList *first_pkt, *last_pkt;  
     int nb_packets;  
     int size;  
     SDL_mutex *mutex;  
     SDL_cond *cond;  
 } PacketQueue;  

首先，我们应当指出nb_packets(队列中包的个数) 是与size 不一样的－－size 表示我们从packet->size 中得到的字节数 (所有packets包的) 。你会注意到我们有一个互斥量mutex 和一个条件变量cond 在结构体里面。这是因为SDL 是在一个独立的线程中来进行音频处理的。如果我们没有正确的锁定这个队列，我们有可能把数据搞乱。我们将来看一个这个队列是如何来运行的。每一个程序员应当知道如何来生成的一个队列，但是我们将把这部分也来讨论从而可以学习到SDL 的函数。

一开始我们先创建一个函数来初始化队列：

[cpp]view plaincopy 
      
print?
 void packet_queue_init(PacketQueue *q)  
 {  
     memset(q, 0, sizeof(PacketQueue));  
     q->mutex = SDL_CreateMutex();  
     q->cond = SDL_CreateCond();  
 }  

接着我们再做一个函数来给队列中填入东西：

[cpp]view plaincopy 
       
print?
 int packet_queue_put(PacketQueue *q, AVPacket *pkt)  
 {  
     AVPacketList *pkt1;  
     if(av_dup_packet(pkt) < 0)  
     {  
         return -1;  
     }  
     pkt1 = av_malloc(sizeof(AVPacketList));  
     if (!pkt1)  
         return -1;  
     pkt1->pkt = *pkt;  
     pkt1->next = NULL;  
   
     SDL_LockMutex(q->mutex);  
   
     if (!q->last_pkt)    //刚开始若队列q为空，则q->first_pkt=q->last_pkt  
         q->first_pkt = pkt1;  
     else    //插入队列，从尾部插入  
         q->last_pkt->next = pkt1;   
     q->last_pkt = pkt1;  
     q->nb_packets++;  
     q->size += pkt1->pkt.size;  
     SDL_CondSignal(q->cond);  
   
     SDL_UnlockMutex(q->mutex);  
     return 0;  
 }  

函数av_dup_packet()原形：int av_dup_packet(AVPacket *pkt) ;
AVPacket 的data 在内存中buffer有两种情况：
1)由av_malloc申请的独立的buffer(unshared buffer)；
2)是其他AVPacket或者其他reuseable 内存的一部分(shared buffer); av_dup_packet作用是通过调用 av_malloc、memcpy、memset等函数，将shared buffer 的AVPacket duplicate(复制)到独立的buffer中。并且修改AVPacket的析构函数指针av_destruct_pkt。

函数SDL_LockMutex()锁定队列的互斥量以便于我们向队列中添加东西，然后函数 SDL_CondSignal()通过我们的条件变量为一个接收函数（如果它在等待）发出一个信号来告诉它现在已经有数据了，接着就会解锁互斥量并让队列可以自由访问。

下面是相应的接收函数。注意函数SDL_CondWait()是如何按照我们的要求让函数阻塞 block 的（例如一直等到队列中有数据）。

[cpp]view plaincopy 
       
print?
 int quit = 0;  
 static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)  
 {  
     AVPacketList *pkt1;  
     int ret;  
   
     SDL_LockMutex(q->mutex);  
   
     for(;;)  
     {  
         if(quit)  
         {  
             ret = -1;  
             break;  
         }  
   
         pkt1 = q->first_pkt;  
         if (pkt1)  
         {  
             q->first_pkt = pkt1->next;  
             if (!q->first_pkt)  
                 q->last_pkt = NULL;  
             q->nb_packets--;  
             q->size -= pkt1->pkt.size;  
             *pkt = pkt1->pkt;  
             av_free(pkt1);  
             ret = 1;  
             break;  
         }  
         else if (!block)  
         {  
             ret = 0;  
             break;  
         }  
         else  
         {  
             SDL_CondWait(q->cond, q->mutex);  
         }  
     }  
     SDL_UnlockMutex(q->mutex);  
     return ret;  
 }  

关于 SDL_CondWait(SDL_cond* cond,SDL_mutex* mutex)：

int SDL_CondWait(SDL_cond* cond,SDL_mutex* mutex)

cond :the condition variable to wait on

mutex:the mutex used to coordinate thread access

Returns 0 when it is signaled or a negative error code on failure; call SDL_GetError() for more information.

This function unlocks the specified mutex and waits for another thread to call SDL_CondSignal() or SDL_CondBroadcast() on the condition variable cond. Once the condition variable is signaled, the mutex is re-locked and the function returns.

The mutex must be locked before calling this function.

This function is the equivalent of calling SDL_CondWaitTimeout() with a time length of SDL_MUTEX_MAXWAIT.

正如你所看到的，我们已经用一个无限循环包装了这个函数以便于我们想用阻塞的方式来得到数据。我们通过使用SDL 中的函数SDL_CondWait()来避免无限循环。基本上，所有的 CondWait 只等待从SDL_CondSignal()函数（或者SDL_CondBroadcast()函数）中发出的信号，然后再继续执行。然而，虽然看起来我们陷入了我们的互斥体中－－如果我们一直保持着这个锁，我们的函数将永远无法把数据放入到队列中去！但是， SDL_CondWait()函数也为我们做了解锁互斥量的动作然后才尝试着在得到信号后去重新锁定它。

意外情况

你们将会注意到我们有一个全局变量quit，我们用它来保证还没有设置程序退出的信号（SDL 会自动处理TERM 类似的信号）。否则，这个线程将不停地运行直到我们使用kill -9 来结束程序，必需要设置 quit 标志为1。

[cpp]view plaincopy 
     
print?
 main(){  
     ...  
     SDL_PollEvent(&event);  
     switch(event.type){  
         case SDL_QUIT:  
             quit = 1;  
         ...  

为队列提供包

剩下的我们唯一需要为队列所做的事就是提供包了：

[cpp]view plaincopy 
    
print?
 PacketQueue audioq;  
 int main(int argc, char *argv[])  
 {  
      ......  
     avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);  
   
     // audio_st = pFormatCtx->streams[index]  
     packet_queue_init(&audioq);  
     SDL_PauseAudio(0);  
     //SDL_PauseAudio库函数可以暂停或者恢复audio_callback函数的执行，0是恢复，其他的是暂停  

函数SDL_PauseAudio()让音频设备最终开始工作。如果没有立即供给足够的数据，它会播放静音。

我们已经建立好我们的队列，现在我们准备为它提供包。先看一下我们的读取包的循环：

[cpp]view plaincopy 
       
print?
 // Read frames and save first five frames to disk  
     i=0;  
     while(av_read_frame(pFormatCtx, &packet)>=0)  
     {  
         // Is this a packet from the video stream?  
         if(packet.stream_index==videoStream)  
         {  
             // Decode video frame  
             avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished,&packet);  
   
             // Did we get a video frame?  
             if(frameFinished)  
             {  
                 SDL_LockYUVOverlay(bmp);  
   
                 AVPicture pict;  
                 pict.data[0] = bmp->pixels[0];  
                 pict.data[1] = bmp->pixels[2];  
                 pict.data[2] = bmp->pixels[1];  
   
                 pict.linesize[0] = bmp->pitches[0];  
                 pict.linesize[1] = bmp->pitches[2];  
                 pict.linesize[2] = bmp->pitches[1];  
   
                 // Convert the image into YUV format that SDL uses  
                 sws_scale(sws_ctx,(uint8_t const * const *)pFrame->data,pFrame->linesize,0,  
                                            pCodecCtx->height,pict.data,pict.linesize);  
   
                 SDL_UnlockYUVOverlay(bmp);  
   
                 rect.x = 0;  
                 rect.y = 0;  
                 rect.w = pCodecCtx->width;  
                 rect.h = pCodecCtx->height;  
                 SDL_DisplayYUVOverlay(bmp, &rect);  
                 av_free_packet(&packet);  
             }  
         }  
         else if(packet.stream_index==audioStream)  
         {  
             packet_queue_put(&audioq, &packet);  
         }  
         else  
         {  
             av_free_packet(&packet);  
         }  
         // Free the packet that was allocated by av_read_frame  
         SDL_PollEvent(&event);  
         switch(event.type)  
         {  
             case SDL_QUIT:  
                 quit = 1;  
                 SDL_Quit();  
                 exit(0);  
                 break;  
             default:  
                 break;  
         }  
   
     }  

注意：我们没有在把包放到队列里的时候释放它，我们将在解码后来释放它。

取出包

现在，让我们最后让声音回调函数audio_callback 来从队列中取出包。回调函数的格式必需为void callback(void *userdata, Uint8 *stream, int len)，这里的userdata 就是我们给到 SDL 的指针，stream 是我们要把声音数据写入的缓冲区指针，len 是缓冲区的大小。下面就是代码：

[cpp]view plaincopy 
        
print?
 void audio_callback(void *userdata, Uint8 *stream, int len){  
     AVCodecContext *aCodecCtx=(AVCodecContext *)userdata;  
     int len1, audio_size;  
   
     static uint8_t audio_buf[(MAX_AUDIO_FRAME_SIZE*3)/2];  
     static unsigned int audio_buf_size = 0;  
     static unsigned int audio_buf_index = 0;  
   
     while(len>0)  
     {  
         if(audio_buf_index>=audio_buf_size)  
         {  
             /* We have already sent all our data; get more */  
             audio_size = audio_decode_frame(aCodecCtx, audio_buf, audio_buf_size);  
             if(audio_size<0)  
             {  
                 /* If error, output silence */  
                 audio_buf_size = 1024; // arbitrary?  
                 memset(audio_buf, 0, audio_buf_size);  
             }  
             else  
             {  
                 audio_buf_size = audio_size;  
             }  
             audio_buf_index = 0;  
         }  
         len1 = audio_buf_size - audio_buf_index;  
         if(len1>len)  
         len1 = len;  
         memcpy(stream, (uint8_t *)audio_buf + audio_buf_index, len1);  
         len -= len1;  
         stream += len1;  
         audio_buf_index += len1;  
     }  
 }  

这基本上是一个简单的从另外一个我们将要写的audio_decode_frame()函数中获取数据的循环，这个循环把结果写入到中间缓冲区，尝试着向流中写入len 字节并且在我们没有足够的数据的时候会获取更多的数据或者当我们有多余数据的时候保存下来为后面使用(利用static类型实现) 。这个 audio_buf 的大小为1.5倍的声音帧的大小以便于有一个比较好的缓冲，这个声音帧的大小是ffmpeg 给出的。

最后解码音频

让我们看一下解码器的真正部分audio_decode_frame函数：

[cpp]view plaincopy 
      
print?
 int audio_decode_frame(AVCodecContext *aCodecCtx, uint8_t *audio_buf, int buf_size)  
 {  
     static AVPacket pkt;  
     static uint8_t *audio_pkt_data = NULL;  
     static int audio_pkt_size = 0;  
     static AVFrame frame;  
   
     int len1, data_size = 0;  
   
     for(;;)  
     {  
         while(audio_pkt_size > 0)  
         {  
             int got_frame = 0;  
             len1 = avcodec_decode_audio4(aCodecCtx, &frame, &got_frame, &pkt);  
             if(len1 < 0)  
             {  
                /* if error, skip frame */  
                audio_pkt_size = 0;  
                break;  
             }  
             //audio_pkt_data += len1;  
             audio_pkt_size -= len1;  
             if (got_frame)  
             {  
                 data_size =  
                 av_samples_get_buffer_size  
                 (  
                 NULL,  
                 aCodecCtx->channels,  
                 frame.nb_samples,  
                 aCodecCtx->sample_fmt,  
                 1  
                 );  
                 memcpy(audio_buf, frame.data[0], data_size);  
             }  
             if(data_size <= 0)  
             {  
                 /* No data yet, get more frames */  
                 continue;  
             }  
             /* We have data, return it and come back for more later */  
             return data_size;  
         }  
         if(pkt.data)  
             av_free_packet(&pkt);  
   
         if(quit)  
         {  
            return -1;  
         }  
   
         if(packet_queue_get(&audioq, &pkt, 1) < 0)  
         {  
             return -1;  
         }  
         //audio_pkt_data = pkt.data;  
         audio_pkt_size = pkt.size;  
     }  
 }  

关于：int avcodec_decode_audio4(AVCodecContext *avctx, AVFrame *frame, int *got_frame_ptr, const AVPacket *avpkt)：

/**
* Decode the audio frame of size avpkt->size from avpkt->data into frame.
*
* Some decoders may support multiple frames in a single AVPacket. Such
* decoders would then just decode the first frame and the return value would be
* less than the packet size. In this case, avcodec_decode_audio4 has to be
* called again with an AVPacket containing the remaining data in order to
* decode the second frame, etc... Even if no frames are returned, the packet
* needs to be fed to the decoder with remaining data until it is completely
* consumed or an error occurs.
*
* Some decoders (those marked with CODEC_CAP_DELAY) have a delay between input
* and output. This means that for some packets they will not immediately
* produce decoded output and need to be flushed at the end of decoding to get
* all the decoded data. Flushing is done by calling this function with packets
* with avpkt->data set to NULL and avpkt->size set to 0 until it stops
* returning samples. It is safe to flush even those decoders that are not
* marked with CODEC_CAP_DELAY, then no samples will be returned.
*
* @warning The input buffer, avpkt->data must be FF_INPUT_BUFFER_PADDING_SIZE
*          larger than the actual read bytes because some optimized bitstream
*          readers read 32 or 64 bits at once and could read over the end.
*
* @param      avctx the codec context
* @param[out] frame The AVFrame in which to store decoded audio samples.
*                   The decoder will allocate a buffer for the decoded frame by
*                   calling the AVCodecContext.get_buffer2() callback.
*                   When AVCodecContext.refcounted_frames is set to 1, the frame is
*                   reference counted and the returned reference belongs to the
*                   caller. The caller must release the frame using av_frame_unref()
*                   when the frame is no longer needed. The caller may safely write
*                   to the frame if av_frame_is_writable() returns 1.
*                   When AVCodecContext.refcounted_frames is set to 0, the returned
*                   reference belongs to the decoder and is valid only until the
*                   next call to this function or until closing or flushing the
*                   decoder. The caller may not write to it.
* @param[out] got_frame_ptr Zero if no frame could be decoded, otherwise it is
*                           non-zero. Note that this field being set to zero
*                           does not mean that an error has occurred. For
*                           decoders with CODEC_CAP_DELAY set, no given decode
*                           call is guaranteed to produce a frame.
* @param[in] avpkt The input AVPacket containing the input buffer.
*                   At least avpkt->data and avpkt->size should be set. Some
*                   decoders might also require additional fields to be set.
* @return A negative error code is returned if an error occurred during
*         decoding, otherwise the number of bytes consumed from the input
*         AVPacket is returned.
*/

关于：int av_samples_get_buffer_size(int *linesize, int nb_channels, int nb_samples, enum AVSampleFormat sample_fmt, int align)：

/**
* Get the required buffer size for the given audio parameters.
*
* @param[out] linesize calculated linesize, may be NULL
* @param nb_channels   the number of channels
* @param nb_samples    the number of samples in a single channel
* @param sample_fmt    the sample format
* @param align         buffer size alignment (0 = default, 1 = no alignment)
* @return              required buffer size, or negative error code on failure
*/

整个过程实际上从函数的尾部开始，在这里我们调用了packet_queue_get()函数。我们从队列中取出包，并且保存它的信息。然后，一旦我们有了可以使用的包，我们就调用函数 avcodec_decode_audio2()，它的功能就像它的姐妹函数avcodec_decode_video()一样，唯一的区别是它的一个包里可能有不止一个声音帧，所以你可能要调用很多次来解码出包中所有的数据。同时也要记住进行指针audio_buf 的强制转换，因为SDL 给出的是8位整型缓冲指针而ffmpeg 给出的数据是16位的整型指针。你应该也会注意到len1和data_size 的不同，len1表示解码使用的数据的在包中的大小，data_size 表示实际返回的原始声音数据的大小。

当我们得到一些数据的时候，我们立刻返回来看一下是否仍然需要从队列中得到更加多的数据或者我们已经完成了。如果我们仍然有更加多的数据要处理，我们把它保存到下一次。如果我们完成了一个包的处理，我们最后要释放它。就是这样。我们利用主的读取队列循环从文件得到音频并送到队列中，然后被 audio_callback 函数从队列中读取并处理，最后把数据送给SDL，于是SDL 就相当于我们的声卡。让我们继续并且编译：

gcc ./tutorial03-1.c -o ./tutorial03-1 -lavutil -lavformat -lavcodec -lswscale -lz -lm `sdl-config --cflags --libs` -I /home/Jiakun/ffmpeg_build/include/ -L /home/Jiakun/ffmpeg_build/lib/ -I /usr/include/SDL/

啊哈！视频虽然还是像原来那样快，但是声音可以正常播放了。这是为什么呢？因为声音信息中的采样率－－虽然我们把声音数据尽可能快的填充到声卡缓冲中，但是声音设备却会按照原来指定的采样率来进行播放。我们几乎已经准备好来开始同步音频和视频了，但是首先我们需要的是一点程序的组织。用队列的方式来组织和播放音频在一个独立的线程中工作的很好：它使得程序更加更加易于控制和模块化。在我们开始同步音视频之前，我们需要让我们的代码更加容易处理。所以下次要讲的是：创建一个线程。