ffmpeg的tutorial中文版学习笔记(一)

最新推荐文章于 2016-08-10 10:57:10 发布

郑亚帅

最新推荐文章于 2016-08-10 10:57:10 发布

阅读量457

点赞数

分类专栏： ffmpeg

ffmpeg 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

在网上下载了一些pdf版的ffmpeg的tutorial中文版，在学习过程中发现有很多错误，这些错误，或者是文章中的代码中的变量作者没有定义，或者是由于ffmpeg一直在更新，”以时俱进“，而这些资料早已年久失修，一些函数早已更名，或被别的函数替代，学习过程中发现很多问题，故决定做下笔记，做出总结：

FFMPEG 是一个很好的库，可以用来创建视频应用或者生成特定的工具。FFMPEG 几乎为你把所有的繁重工作都做了，比如解码、编码、复用和解复用。这使得多媒体应用程序变得容易编写。它是一个简单的，用C 编写的，快速的并且能够解码几乎所有你能用到的格式，当然也包括编码多种格式。

唯一的问题是它的文档基本上是没有的。有一个单独的指导讲了它的基本原理另外还有一个使用doxygen 生成的文档。这就是为什么当我决定研究 FFMPEG 来弄清楚音视频应用程序是如何工作的过程中，我决定把这个过程用文档的形式记录并且发布出来作为初学指导的原因。

在FFMPEG 工程中有一个示例的程序叫作ffplay。它是一个用C 编写的利用 ffmpeg 来实现完整视频播放的简单播放器。这个指导将从原来Martin Bohme 写的一个更新版本的指导开始（我借鉴了一些），基于Fabrice Bellard 的ffplay，我将从那里开发一个可以使用的视频播放器。在每一个指导中，我将介绍一个或者两个新的思想并且讲解我们如何来实现它。每一个指导都会有一个C 源文件，你可以下载，编译并沿着这条思路来自己做。源文件将向你展示一个真正的程序是如何运行，我们如何来调用所有的部件，也将告诉你在这个指导中技术实

现的细节并不重要。当我们结束这个指导的时候，我们将有一个少于1000 行代码的可以工作的视频播放器。

在写播放器的过程中，我们将使用SDL 来输出音频和视频。SDL 是一个优秀的跨平台的多媒体库，被用在MPEG 播放、模拟器和很多视频游戏中。你将需要下载并安装SDL 开发库到你的系统中，以便于编译这个指导中的程序。

这篇指导适用于具有相当编程背景的人。至少至少应该懂得C 并且有队列和互斥量等概念。你应当了解基本的多媒体中的像波形一类的概念，但是你不必知道的太多，因为我将在这篇指导中介绍很多这样的概念。

更新：我修正了在指导7 和8 中的一些代码错误，也添加-lavutil 参数。

指导1：制作屏幕录像

源代码：tutorial01.c

概要

电影文件有很多基本的组成部分。首先，文件本身被称为容器Container，容器的类型决定了信息被存放在文件中的位置。AVI 和 Quicktime 就是容器的例子。接着，你有一组流，例如，你经常有的是一个音频流和一个视频流。（一个流只是一种想像出来的词语，用来表示一连串的通过时间来串连的数据元素）。在流中的数据元素被称为帧Frame。每个流是由不同的编码器来编码生成的。编解码器描述了实际的数据是如何被编码Coded 和解码DECoded 的，因此它的名字叫做CODEC。Divx 和 MP3 就是编解码器的例子。接着从流中被读出来的叫做包 Packets。包是一段数据，它包含了一段可以被解码成方便我们最后在应用程序中操作的原始帧的数据。根据我们的目的，每个包包含了完整的帧或者对于音频来说是许多格式的完整帧。

基本上来说，处理视频和音频流是很容易的：

10 从video.avi 文件中打开视频流video_stream

20 从视频流中读取包到帧中

30 如果这个帧还不完整，跳到20

40 对这个帧进行一些操作

50 跳回到20

虽然很多程序可能在对帧进行操作的时候非常的复杂，但是在这个程序中若使用ffmpeg 来处理多种媒体是相当容易的。因此在这篇指导中，我们将打开一个文件，读取里面的视频流，而且我们对帧的操作将是把这个帧写到一个PPM 文件中。

打开文件

首先来看一下我们如何打开一个文件。通过ffmpeg，你必需先初始化这个库：（注意在某些系统中必需用<ffmpeg/avcodec.h>和<ffmpeg/avformat.h>来替换）

[cpp]view plaincopy 
      
print?
 #include "libavcodec/avcodec.h"  
 #include "libavformat/avformat.h"  
 #include "libswscale/swscale.h"  
 #include <stdio.h>  
 void SaveFrame(AVFrame *pFrame,int width,int height,int iFrame);  
 int main(int argc,char * argv[])  
 {  
     av_register_all();  

这里注册了所有的文件格式和编解码器的库，所以它们将被自动的使用在被打开的合适格式的文件上。注意你只需要调用av_register_all()一次，因此我们在主函数main()中来调用它。如果你喜欢，也可以只注册特定的格式和编解码器，但是通常你没有必要这样做。现在我们可以真正的打开文件：

[cpp]view plaincopy 
     
print?
 AVFormatContext *pFormatCtx;  
 pFormatCtx=avformat_alloc_context();  
 //Open video file  
 #ifdef _FFMPEG_0_6_  
      if(av_open_input_file(&pFormatCtx,argv[1],NULL,0,NULL))  
 #else  
      if (avformat_open_input(&pFormatCtx,argv[1],NULL,NULL)!=0)  
 #endif  
           return -1;//Couldn't open file  

我们通过第一个参数来获得文件名。这个av_open_input_file函数读取文件的头部并且把信息保存到我们给的AVFormatContext 结构体中。最后三个参数用来指定特殊的文件格式，缓冲大小和格式参数，但如果把它们设置为空NULL 或者0，libavformat 将自动检测这些参数。(av_open_input_file函数现在已被avformat_open_input函数取代)

关于avformat_open_input：

[cpp]view plaincopy 
      
print?
 /** 
 * Open an input stream and read the header. The codecs are not opened. 
 * The stream must be closed with avformat_close_input(). 
 * 
 * @param ps Pointer to user-supplied AVFormatContext (allocated by avformat_alloc_context). 
 *           May be a pointer to NULL, in which case an AVFormatContext is allocated by this 
 *           function and written into ps. 
 *           Note that a user-supplied AVFormatContext will be freed on failure. 
 * @param filename Name of the stream to open. 
 * @param fmt If non-NULL, this parameter forces a specific input format. 
 *            Otherwise the format is autodetected. 
 * @param options  A dictionary filled with AVFormatContext and demuxer-private options. 
 *                 On return this parameter will be destroyed and replaced with a dict containing 
 *                 options that were not found. May be NULL. 
 * 
 * @return 0 on success, a negative AVERROR on failure. 
 * 
 * @note If you want to use custom IO, preallocate the format context and set its pb field. 
 */  
 int avformat_open_input(AVFormatContext **ps, const char *filename, AVInputFormat *fmt, AVDictionary **options);  

这个函数只是检测了文件的头部，所以接着我们需要检查在文件中的流的信息：

[cpp]view plaincopy 
     
print?
 //Retrieve stream information  
 #ifdef _FFMPEG_0_6_  
      if (av_find_stream_info(pFormatCtx)<0)  
 #else  
      if(avformat_find_stream_info(pFormatCtx,NULL)<0)  
 #endif  
           return -1;//Couldn't find stream information  

这个av_find_stream_info函数（已被avformat_find_stream_info函数取代）为pFormatCtx->streams 填充上正确的信息。我们引进一个手工调试的函数来看一下里面有什么：

[cpp]view plaincopy 
     
print?
 //Dump information about file onto standard error  
  av_dump_format(pFormatCtx,0,argv[1],0);  

现在pFormatCtx->streams 仅仅是一组大小为pFormatCtx->nb_streams 的指针，所以让我们先跳过它直到我们找到一个视频流。

[cpp]view plaincopy 
      
print?
 int i;  
  AVCodecContext *pCodecCtx;  
  //Find the first video stream  
  int videoStream=-1;  
  printf("pFormatCtx->nb_streams=%d\n",pFormatCtx->nb_streams);  
  for (i = 0; i < pFormatCtx->nb_streams; ++i)  
  {  
       if (pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO)  
       {  
            videoStream=i;  
            break;  
       }  
  }  
  if (videoStream==-1)  
       return -1;//Didn't find a video stream  
  //Get a pointer to the codec context for the video stream  
  pCodecCtx=pFormatCtx->streams[videoStream]->codec;  
  printf("videoStream=%d\n", videoStream);  

源代码中的pFormatCtx->streams[i]->codec->codec_type类型为:

[cpp]view plaincopy 
     
print?
 enum AVMediaType {  
     AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA  
     AVMEDIA_TYPE_VIDEO,  
     AVMEDIA_TYPE_AUDIO,  
     AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous  
     AVMEDIA_TYPE_SUBTITLE,  
     AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse  
     AVMEDIA_TYPE_NB  
 };  

流中关于编解码器的信息就是被我们叫做"codec context"（编解码器上下文）的东西。这里面包含了流中所使用的关于编解码器的所有信息，现在我们有了一个指向它的指针。但是我们必需要找到真正的编解码器并且打开它：

[cpp]view plaincopy 
      
print?
      AVCodec *pCodec;     //Find the decoder for the video stream       
      pCodec=avcodec_find_decoder(pCodecCtx->codec_id);       
      if(pCodec==NULL)  
      {  
           fprintf(stderr,"Unsupported codec!\n");  
           return -1; // Codec not found  
      }  
      //Open codec  
      #ifdef _FFMPEG_0_6_  
           if(avcodec_open(pCodecCtx,pCodec)<0)  
      #else  
           if(avcodec_open2(pCodecCtx,pCodec,NULL)<0)  
      #endif  
           return -1;//Could not open codec  

有些人可能会从旧的指导中记得有两个关于这些代码其它部分：添加 CODEC_FLAG_TRUNCATED 到pCodecCtx->flags 和添加一个hack 来粗糙的修正帧率。这两个修正已经不再存在于ffplay.c 中。因此我必须假设它们不再必要。我们移除了那些代码后还有一个需要指出的不同点：pCodecCtx->time_base 现在已经保存了帧率的信息。time_base 是一个结构体，它里面有一个分子和分母 (AVRational)。我们使用分数的方式来表示帧率是因为很多编解码器使用非整数的帧率（例如NTSC 使用29.97fps）。

保存数据

现在我们需要找到一个地方来保存帧：

[cpp]view plaincopy 
      
print?
 AVFrame *pFrame;  
 //Allocate video frame  
 pFrame=avcodec_alloc_frame();  

因为我们准备输出保存24 位RGB 色的PPM 文件，我们必需把帧的格式从原来的转换为RGB。FFMPEG 将为我们做这些转换。在大多数项目中（包括我们的这个）我们都想把原始的帧转换成一个特定的格式。让我们先为转换来申请一帧的内存。

[cpp]view plaincopy 
      
print?
 //Allocate an AVFrame structure  
 AVFrame *pFrameRGB;  
 pFrameRGB=avcodec_alloc_frame();  
 if (pFrameRGB==NULL)  
      return -1;  

即使我们申请了一帧的内存，当转换的时候，我们仍然需要一个地方来放置原始的数据。我们使用avpicture_get_size 来获得我们需要的大小，然后手工申请内存空间：

[cpp]view plaincopy 
      
print?
 uint8_t *buffer;  
 int numBytes;  
 //Determine required buffer size and allocate buffer  
 numBytes=avpicture_get_size(PIX_FMT_RGB24,pCodecCtx->width,pCodecCtx->height);  
 buffer=(uint8_t *)av_malloc(numBytes*sizeof(uint8_t));  

av_malloc 是ffmpeg 的malloc，用来实现一个简单的malloc 的包装，这样来保证内存地址是对齐的（4 字节对齐或者2 字节对齐）。它并不能保证你不被内存泄漏，重复释放或者其它malloc 的问题所困扰。现在我们使用avpicture_fill函数把帧和我们新申请到的内存结合。关于 AVPicture 的构成：AVPicture 结构体是AVFrame 结构体的子集――AVFrame 结构体的开始部分与AVPicture 结构体是一样的。

[cpp]view plaincopy 
       
print?
 //Assign appropriate parts of buffer to image planes in pFrameRGB  
 //Note that pFrameRGB is an AVFrame,but AVFrame is a superset of AVPicture  
 avpicture_fill((AVPicture *)pFrameRGB,buffer,PIX_FMT_RGB24,pCodecCtx->width,pCodecCtx->height);  

其中的avpicture_fill()函数含义:

[cpp]view plaincopy 
      
print?
 /** 
 * Setup the picture fields based on the specified image parameters 
 * and the provided image data buffer. 
 * 
 * The picture fields are filled in by using the image data buffer 
 * pointed to by ptr. 
 * 
 * If ptr is NULL, the function will fill only the picture linesize 
 * array and return the required size for the image buffer. 
 * 
 * To allocate an image buffer and fill the picture data in one call, 
 * use avpicture_alloc(). 
 * 
 * @param picture       the picture to be filled in 
 * @param ptr           buffer where the image data is stored, or NULL 
 * @param pix_fmt       the pixel format of the image 
 * @param width         the width of the image in pixels 
 * @param height        the height of the image in pixels 
 * @return the size in bytes required for src, a negative error code 
 * in case of failure 
 * 
 * @see av_image_fill_arrays() 
 */  
 int avpicture_fill(AVPicture *picture, const uint8_t *ptr, enum AVPixelFormat pix_fmt, int width, int height);  

最后，我们已经准备好来从流中读取数据了。

读取数据

我们将要做的是通过读取包来读取整个视频流，然后把它解码成帧，最后转换格式并保存。

[cpp]view plaincopy 
     
print?
 int frameFinished;  
 AVPacket packet;  
 i=0;  
 av_init_packet(&packet);//  
 while(av_read_frame(pFormatCtx,&packet)>=0)  
 {  
      //printf("packet.stream_index=%d, packet.size=%d\n", packet.stream_index,packet.size);  
      //Is this a packet from the video stream?  
      if (packet.stream_index==videoStream)  
      {  
           //Decode video frame  
           //avcodec_decode_video(pCodecCtx,pFrame,&frameFinished,packet.data,packet.size);  
           avcodec_decode_video2(pCodecCtx,pFrame,&frameFinished,&packet);  
           printf("frameFinished=%d\n",frameFinished );  
           //Did we get a video frame?  
           if (frameFinished)  
           {  
                //Convert the image from its native format to RGB  
                //img_convert((AVPicture*)pFrameRGB,PIX_FMT_RGB24,(AVPicture*)pFrame,pCodecCtx->pix_fmt,pCodecCtx->width,pCodecCtx->height);  
                static struct SwsContext *img_convert_ctx;  
                img_convert_ctx=sws_getContext(pCodecCtx->width,pCodecCtx->height,pCodecCtx->pix_fmt,pCodecCtx->width,pCodecCtx->height,  
                                               PIX_FMT_RGB24,SWS_BICUBIC,NULL,NULL,NULL);  
                if(img_convert_ctx==NULL)  
                {  
                     fprintf(stderr,"Can not initialize the conversion context!\n");  
                     exit(1);  
                }  
                sws_scale(img_convert_ctx,(const uint8_t *const)pFrame->data,pFrame->linesize,0,pCodecCtx->height,  
                          pFrameRGB->data,pFrameRGB->linesize);  
                //Save the frame to disk  
                if (++i<=5)  
                     SaveFrame(pFrameRGB,pCodecCtx->width,pCodecCtx->height,i);  
           }  
      }  
      //Free the packet that was allocated by av_read_frame  
      av_free_packet(&packet);  
 }  

这个循环过程是比较简单的：av_read_frame()读取一个包并且把它保存到 AVPacket 结构体中。注意我们仅仅申请了一个包的结构体――ffmpeg 为我们申请了内部的数据的内存并通过packet.data 指针来指向它。这些数据可以在后面通过av_free_packet()函数来释放。函avcodec_decode_video()把包转换为帧。然而当解码一个包的时候，我们可能没有得到我们需要的关于一帧的完整信息。因此，当我们得到一个帧的时候，avcodec_decode_video()为我们设置了帧结束标志 frameFinished。然后我们使用 img_convert()函数来把帧从原始格式（pCodecCtx->pix_fmt）转换成为RGB 格式(img_convert()函数已被sws_scale()函数取代)。要记住，你可以把一个 AVFrame 结构体的指针转换为AVPicture 结构体的指针。最后我们把帧以及高度,宽度信息传递给我们的SaveFrame 函数。

SwsContext：视频分辩率、色彩空间变换时所需要的上下文句柄。

关于sws_scale()函数：

[cpp]view plaincopy 
       
print?
 /** 
 * Scale the image slice in srcSlice and put the resulting scaled 
 * slice in the image in dst. A slice is a sequence of consecutive 
 * rows in an image. 
 * 
 * Slices have to be provided in sequential order, either in 
 * top-bottom or bottom-top order. If slices are provided in 
 * non-sequential order the behavior of the function is undefined. 
 * 
 * @param c         the scaling context previously created with 
 *                  sws_getContext() 
 * @param srcSlice  the array containing the pointers to the planes of 
 *                  the source slice 
 * @param srcStride the array containing the strides for each plane of 
 *                  the source image 
 * @param srcSliceY the position in the source image of the slice to 
 *                  process, that is the number (counted starting from 
 *                  zero) in the image of the first row of the slice 
 * @param srcSliceH the height of the source slice, that is the number 
 *                  of rows in the slice 
 * @param dst       the array containing the pointers to the planes of 
 *                  the destination image 
 * @param dstStride the array containing the strides for each plane of 
 *                  the destination image 
 * @return          the height of the output slice 
 */  
 int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[], const int srcStride[], int srcSliceY, int srcSliceH, uint8_t *const dst[], const int dstStride[]);  

关于包Packets的注释

从技术上讲一个包可以包含部分或者其它的数据，但是 ffmpeg 的解释器保证了我们得到的包Packets 包含的要么是完整的要么是多种完整的帧。

现在我们需要做的是让SaveFrame 函数能把RGB 信息定稿到一个PPM 格式的文件中。我们将生成一个简单的PPM 格式文件，请相信，它是可以工作的。

[cpp]view plaincopy 
      
print?
 void SaveFrame(AVFrame *pFrame,int width,int height,int iFrame)  
 {  
      FILE *pFile;  
      char szFilename[32];  
      int y;  
      //Open file  
      sprintf(szFilename,"frame%d.ppm",iFrame);  
      pFile=fopen(szFilename,"wb");  
      if(pFile==NULL)  
           return;  
      //Write header  
      fprintf(pFile, "P6\n%d %d\n255\n",width,height );  
      //Write pixel data  
      for (y=0;y<height; ++y)  
           fwrite(pFrame->data[0]+y*pFrame->linesize[0],1,width*3,pFile);  
      //Close file  
      fclose(pFile);  
 }  

我们做了一些标准的文件打开动作，然后写入RGB 数据。我们一次向文件写入一行数据。PPM 格式文件的是一种包含一长串的RGB 数据的文件。如果你了解 HTML 色彩表示的方式，那么它就类似于把每个像素的颜色头对头的展开，就像 #ff0000#ff0000....就表示了了个红色的屏幕。（它被保存成二进制方式并且没有分隔符，但是你自己是知道如何分隔的）。文件的头部表示了图像的宽度和高度以及最大的RGB 值的大小。

关于PPM文件：

ppm是一种简单的图像格式，仅包含格式、图像宽高、bit数等信息和图像数据。
图像数据的保存格式可以用ASCII码，也可用二进制，下面只说说一种ppm格式中比较简单的一种：24位彩色、二进制保存的图像。
文件头+rgb数据:
P6\n
width height\n
255\n
rgbrgb...
其中P6表示ppm的这种格式；\n表示换行符；width和height表示图像的宽高，用空格隔开；255表示每个颜色分量的最大值；rgb数据从上到下，从左到右排放

文件头由3行文本组成，可由fgets读出
    1）第一行为“P6"，表示文件类型
    2）第二行为图像的宽度和高度
    3）第三行为最大的象素值
    接下来是图像数据块。按行顺序存储。每个象素占3个字节，依次为红绿蓝通道，每个通道为1字节整
    数。左上角为坐标原点。

现在，回顾我们的main()函数。一旦我们开始读取完视频流，我们必需清理一切：

[cpp]view plaincopy 
      
print?
 //Free the RGB image  
 av_free(buffer);  
 av_free(pFrameRGB);  
 //Free the YUV frame  
 av_free(pFrame);  
 //Close the codec  
 avcodec_close(pCodecCtx);  
 //Close the video file  
 #ifdef _FFMPEG_0_6_  
      av_close_input_file(pFormatCtx);  
 #else  
      avformat_close_input(&pFormatCtx);  
 #endif  
 avformat_free_context(pFormatCtx);  
 return 0;  

你会注意到我们使用av_free 来释放我们使用avcode_alloc_fram 和av_malloc 来分配的内存。
在我Linux系统下编译的命令：

gcc ./tutorial01.c -o ./tutorial01 -lavutil -lavformat -lavcodec -lswscale -lz -lm -I /home/Jiakun/ffmpeg_build/include -L /home/Jiakun/ffmpeg_build/lib/

郑亚帅

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ffmpeg的tutorial中文版学习笔记(一)

在网上下载了一些pdf版的ffmpeg的tutorial中文版，在学习过程中发现有很多错误，这些错误，或者是文章中的代码中的变量作者没有定义，或者是由于ffmpeg一直在更新，”以时俱进“，而这些资料早已年久失修，一些函数早已更名，或被别的函数替代，学习过程中发现很多问题，故决定做下笔记，做出总结： FFMPEG 是一个很好的库，可以用来创建视频应用或者生成特定的工具。FFM
复制链接

扫一扫

专栏目录