FFmpeg 理论基础

龙城ne货92923

于 2023-12-18 10:01:59 发布

阅读量936

点赞数 26

分类专栏： # ffmpeg 文章标签： ffmpeg

本文链接：https://blog.csdn.net/u011780419/article/details/135055956

版权

ffmpeg 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1. FFmpeg 框架流程图

1.1. 重要的术语

媒体文件：容器／文件（Conainer/File）即特定格式的多媒体文件，比如mp4、flv、mkv等。
媒体流（Stream）：表示时间轴上的一段连续数据，如一段声音数据、一段视频数据或一段字幕数据，可以是压缩的，也可以是非压缩的，压缩的数据需要关联特定的编解码器（有些码流音频他是纯PCM）。
数据帧／数据包（Frame/Packet）：通常，一个媒体流是由大量的数据帧组成的，对于压缩数据，帧对应着编解码器的最小处理单元，分属于不同媒体流的数据帧交错存储于容器之中。
解协议：将流媒体协议的数据，解析为标准的相应的封装格式数据。视音频在网络上传播的时候，常常采用各种流媒体协议，例如HTTP，RTMP，或是MMS等等。这些协议在传输视音频数据的同时，也会传输一些信令数据。这些信令数据包括对播放的控制（播放，暂停，停止），或者对网络状态的描述等。解协议的过程中会去除掉信令数据而只保留视音频数据。例如，采用RTMP协议传输的数据，经过解协议操作后，输出FLV格式的数据。
解封装/复用器：将输入的封装格式的数据，分离成为音频流压缩编码数据和视频流压缩编码数据。封装格式种类很多，例如MP4，MKV，RMVB，TS，FLV，AVI等等，它的作用就是将已经压缩编码的视频数据和音频数据按照一定的格式放到一起。例如，FLV格式的数据，经过解封装操作后，输出H.264编码的视频码流和AAC编码的音频码流。
编解码：将视频/音频压缩编码数据，解码成为非压缩的视频/音频原始数据。音频的压缩编码标准包含AAC，MP3，AC-3等等，视频的压缩编码标准则包含H.264，MPEG2，VC-1等等。解码是整个系统中最重要也是最复杂的一个环节。通过解码，压缩编码的视频数据输出成为非压缩的颜色数据，例如YUV420P，RGB等等；压缩编码的音频数据输出成为非压缩的音频抽样数据，例如PCM数据。
视音频同步：根据解封装模块处理过程中获取到的参数信息，同步解码出来的视频和音频数据，并将视频音频数据送至系统的显卡和声卡播放出来。

1.2. 复用器和解复用器

1.3. 编解码器

2. FFmpeg库的结构

libavcodec encoding/decoding library

编解码库，封装了Codec层，但是有一些Codec是具备自己的 License的，FFmpeg是不会默认添加libx264、FDK-AAC等库的，但是FFmpeg就像一个平台一样，可以将其他的第三方的Codec以插件的方式添加进来，然后为开发者提供统一的接口。

libavfilter graph-based frame editing library

音视频滤镜库，该模块提供了包括音频特效和视频特效的处理，在使用FFmpeg的API进行编解码的过程中，直接使用该模块为音视频数据做特效处理是非常方便同时也非常高效的一种方式。

libavformat I/O and muxing/demuxing library

文件格式和协议库，该模块是最重要的模块之一，封装了 Protocol层和Demuxer、Muxer层，使得协议和格式对于开发者来说是透明的。

libavdevice special devices muxing/demuxing library

输入输出设备库，比如，需要编译出播放声音或者视频的工具ffplay，就需要确保该模块是打开的，同时也需要SDL的预先编译，因为该设备模块播放声音与播放视频使用的都是SDL库。

libavutil common utility library

核心工具库，下面的许多其他模块都会依赖该库做一些基本的音视频处理操作。

libswresample audio resampling, format conversion and mixing

该模块可用于音频重采样，可以对数字音频进行声道数、数据格式、采样率等多种基本信息的转换。

libpostproc post processing library

该模块可用于进行后期处理，当我们使用AVFilter的时候需要打开该模块的开关，因为Filter中会使用到该模块的一些基础函数。

libswscale color conversion and scaling library

该模块是将图像进行格式转换的模块，比如，可以将YUV的数据转换为RGB的数据，缩放尺寸由1280*720变为800*480。

3. FFmpeg 数据结构

AVIOContext：管理输入输出数据的结构体

AVFormatContext：封装格式上下文信息结构体，保存视频文件的封装格式相关信息

AVInputFormat：每种分装格式（例如FLV、MKV、MP4、AVI）对应一个结构体，Demuxing时使用，由

avformat_open_input()设置

AVOutputFormat：Muxing时使用，在调用avformat_write_header()之前必须设置

AVStream：存储每一个视频/音频流信息的结构体（每个视频音频流对应一个该结构体）

AVCodecContext：编解码器上下文结构体，保存视频（音频）编解码相关信息

AVCodec：存储编解码器相关信息结构体，每种视频（音频）编解码器（例如H264解码器）对应一个该结构体

AVPacket：存储压缩编码数据相关信息的结构体

AVFrame：存储解码后数据相关信息结构体

4. FFmpeg 数据结构之间的关系

每个AVStream存储一个视频/音频流的相关数据
每个AVStream对应一个AVCodecContext，存储该视频/音频流使用解码方式的相关数据
每个AVCodecContex中对应一个AVCodec，包含该视频/音频对应的解码器
每种解码器对应一个AVCodec结构

4.1. AVInputFormat/AVFormatContext/AVIContext

分析AVInputFormat/AVFormatContext/AVIContext 三个结构体之间的关系。

AVFormatContext父类

AVInputFormat接口类，libavformat/avidec.c实现了avi封装器的接口。

AVIContext 子类，包含指向父类的指针以及子类特有属性。

AVFormatContext通过AVInputFormat调用avidec封装器的函数，设置AVIContext 特有属性和AVFormatContext通用属性。

// libavformat/avformat.h
typedef struct AVFormatContext {
/**
     * A class for logging and @ref avoptions. Set by avformat_alloc_context().
     * Exports (de)muxer private options if they exist.
     */
    const AVClass *av_class;

    /**
     * The input container format.
     *
     * Demuxing only, set by avformat_open_input().
     */
    const struct AVInputFormat *iformat;

    /**
     * The output container format.
     *
     * Muxing only, must be set by the caller before avformat_write_header().
     */
    const struct AVOutputFormat *oformat;

    /**
     * Format private data. This is an AVOptions-enabled struct
     * if and only if iformat/oformat.priv_class is not NULL.
     *
     * - muxing: set by avformat_write_header()
     * - demuxing: set by avformat_open_input()
     */
    void *priv_data;// AVIContext
	/**
     * I/O context.
     *
     * - demuxing: either set by the user before avformat_open_input() (then
     *             the user must close it manually) or set by avformat_open_input().
     * - muxing: set by the user before avformat_write_header(). The caller must
     *           take care of closing / freeing the IO context.
     *
     * Do NOT set this field if AVFMT_NOFILE flag is set in
     * iformat/oformat.flags. In such a case, the (de)muxer will handle
     * I/O in some other way and this field will be NULL.
     */
    AVIOContext *pb;
	/**
     * Forced video codec.
     * This allows forcing a specific decoder, even when there are multiple with
     * the same codec_id.
     * Demuxing: Set by user
     */
    const AVCodec *video_codec;

    /**
     * Forced audio codec.
     * This allows forcing a specific decoder, even when there are multiple with
     * the same codec_id.
     * Demuxing: Set by user
     */
    const AVCodec *audio_codec;

    /**
     * Forced subtitle codec.
     * This allows forcing a specific decoder, even when there are multiple with
     * the same codec_id.
     * Demuxing: Set by user
     */
    const AVCodec *subtitle_codec;

    /**
     * Forced data codec.
     * This allows forcing a specific decoder, even when there are multiple with
     * the same codec_id.
     * Demuxing: Set by user
     */
    const AVCodec *data_codec;

	/**
     * A callback for opening new IO streams.
     *
     * Whenever a muxer or a demuxer needs to open an IO stream (typically from
     * avformat_open_input() for demuxers, but for certain formats can happen at
     * other times as well), it will call this callback to obtain an IO context.
     *
     * @param s the format context
     * @param pb on success, the newly opened IO context should be returned here
     * @param url the url to open
     * @param flags a combination of AVIO_FLAG_*
     * @param options a dictionary of additional options, with the same
     *                semantics as in avio_open2()
     * @return 0 on success, a negative AVERROR code on failure
     *
     * @note Certain muxers and demuxers do nesting, i.e. they open one or more
     * additional internal format contexts. Thus the AVFormatContext pointer
     * passed to this callback may be different from the one facing the caller.
     * It will, however, have the same 'opaque' field.
     */
    int (*io_open)(struct AVFormatContext *s, AVIOContext **pb, const char *url,
                   int flags, AVDictionary **options);

#if FF_API_AVFORMAT_IO_CLOSE
    /**
     * A callback for closing the streams opened with AVFormatContext.io_open().
     *
     * @deprecated use io_close2
     */
    attribute_deprecated
    void (*io_close)(struct AVFormatContext *s, AVIOContext *pb);
#endif
} AVFormatContext;
// AVInputFormat和AVOutputFormat是互斥的，同一个实例只能是其中一个。

typedef struct AVInputFormat {
const AVClass *priv_class; ///< AVClass for the private context
/**
     * Tell if a given file has a chance of being parsed as this format.
     * The buffer provided is guaranteed to be AVPROBE_PADDING_SIZE bytes
     * big so you do not have to check for that unless you need more.
     */
    int (*read_probe)(const AVProbeData *);

    /**
     * Read the format header and initialize the AVFormatContext
     * structure. Return 0 if OK. 'avformat_new_stream' should be
     * called to create new streams.
     */
    int (*read_header)(struct AVFormatContext *);

    /**
     * Read one packet and put it in 'pkt'. pts and flags are also
     * set. 'avformat_new_stream' can be called only if the flag
     * AVFMTCTX_NOHEADER is used and only in the calling thread (not in a
     * background thread).
     * @return 0 on success, < 0 on error.
     *         Upon returning an error, pkt must be unreferenced by the caller.
     */
    int (*read_packet)(struct AVFormatContext *, AVPacket *pkt);

    /**
     * Close the stream. The AVFormatContext and AVStreams are not
     * freed by this function
     */
    int (*read_close)(struct AVFormatContext *);

    /**
     * Seek to a given timestamp relative to the frames in
     * stream component stream_index.
     * @param stream_index Must not be -1.
     * @param flags Selects which direction should be preferred if no exact
     *              match is available.
     * @return >= 0 on success (but not necessarily the new offset)
     */
    int (*read_seek)(struct AVFormatContext *,
                     int stream_index, int64_t timestamp, int flags);

    /**
     * Get the next timestamp in stream[stream_index].time_base units.
     * @return the timestamp or AV_NOPTS_VALUE if an error occurred
     */
    int64_t (*read_timestamp)(struct AVFormatContext *s, int stream_index,
                              int64_t *pos, int64_t pos_limit);

    /**
     * Start/resume playing - only meaningful if using a network-based format
     * (RTSP).
     */
    int (*read_play)(struct AVFormatContext *);

    /**
     * Pause playing - only meaningful if using a network-based format
     * (RTSP).
     */
    int (*read_pause)(struct AVFormatContext *);

    /**
     * Seek to timestamp ts.
     * Seeking will be done so that the point from which all active streams
     * can be presented successfully will be closest to ts and within min/max_ts.
     * Active streams are all streams that have AVStream.discard < AVDISCARD_ALL.
     */
    int (*read_seek2)(struct AVFormatContext *s, int stream_index, int64_t min_ts, int64_t ts, int64_t max_ts, int flags);

    /**
     * Returns device list with it properties.
     * @see avdevice_list_devices() for more details.
     */
    int (*get_device_list)(struct AVFormatContext *s, struct AVDeviceInfoList *device_list);
} AVInputFormat;

static const AVInputFormat * const demuxer_list[] = {
&ff_avi_demuxer,
}

// libavformat/avidec.c
typedef struct AVIContext {
const AVClass *class;
} AVIContext;

AVInputFormat ff_avi_demuxer = {
.name           = "avi",
.long_name      = NULL_IF_CONFIG_SMALL("AVI (Audio Video Interleaved)"),
.priv_data_size = sizeof(AVIContext),
.extensions     = "avi",
.read_probe     = avi_probe,
.read_header    = avi_read_header,
.read_packet    = avi_read_packet,
.read_close     = avi_read_close,
.read_seek      = avi_read_seek,
.priv_class = &demuxer_class,
};

4.2. URLProtocol/URLContext/AVIOContext/HTTPContext

URLContext通过URLProtocol调用http的函数，设置HTTPContext特有属性和URLContext通用属性。

AVIOContext在此之上，包含了opaque指针，可以指向URLContext使用对应URL的实现，也可以自定义结构体，实现read、write等函数。

// libavformat/url.h
typedef struct URLContext {
    const AVClass *av_class;    /**< information for av_log(). Set by url_open(). */
    const struct URLProtocol *prot; // 广义输入文件
    void *priv_data;  // 文件句柄fd，网络通信socket等
} URLContext;

typedef struct URLProtocol {
    int     (*url_open)( URLContext *h, const char *url, int flags);
    const AVClass *priv_data_class;
    int priv_data_size;
} URLProtocol;
// libavformat/avio.h
typedef struct AVIOContext {
    void *opaque; // passed to the read/write/seek/...functions.
} AVIOContext;
// opaque 来完成广义文件读写操作。 opaque 关联字段用于关联URLContext 结构，间接关联并扩展URLProtocol结构

// libavformat/protocols.c
const URLProtocol *up = url_protocols[i];
// libavformat/protocol_list.c
static const URLProtocol * const url_protocols[] = {
    &ff_http_protocol,
};

4.3. AVCodec/AVCodecContext/MsrleContext

分析AVCodec/AVCodecContext/MsrleContext三个结构体之间的关系。

AVCodecContext父类

AVCodec接口类，libavcodec/msrle.c实现了Msrle解码器的接口。

MsrleContext子类，包含指向父类的指针以及子类特有属性。

AVCodecContext通过AVCodec调用Msrle解码器的函数，设置MsrleContext特有属性和AVCodecContext通用属性。

typedef struct AVCodecContext{ // 类似父类
    void *priv_data;// 子类的私有options，可以没有
    const struct AVCodec  *codec;
}AVCodecContext;

// libavcodec/options.c
s->priv_data= av_mallocz(codec->priv_data_size);
*(const AVClass**)s->priv_data = codec->priv_class;
// 获取AVCodecContext需要传入一个codec
// dec_ctx = avcodec_alloc_context3(dec);

typedef struct AVCodec { // 类似接口
    const AVClass *priv_class;
    int priv_data_size;
} AVCodec;

// libavcodec/allcodecs.c 通过codec_id获取对应解码器的context
dec = avcodec_find_decoder(st->codecpar->codec_id);
const AVCodec *c = codec_list[i];
// libavcodec/codec_list.c
static const AVCodec * const codec_list[] = {
    &ff_msrle_decoder,
}

// libavcodec/msrle.c 具体的解码器的context
typedef struct MsrleContext { // 类似子类
    AVCodecContext *avctx; // 类似父类，通用属性和函数
    AVFrame *frame;

    GetByteContext gb;

    uint32_t pal[256]; // 私有属性
} MsrleContext;

AVCodec ff_msrle_decoder = {
    .name           = "msrle",
    .long_name      = NULL_IF_CONFIG_SMALL("Microsoft RLE"),
    .type           = AVMEDIA_TYPE_VIDEO,
    .id             = AV_CODEC_ID_MSRLE,
    .priv_data_size = sizeof(MsrleContext),
    .init           = msrle_decode_init,
    .close          = msrle_decode_end,
    .decode         = msrle_decode_frame,
    .flush          = msrle_decode_flush,
    .capabilities   = AV_CODEC_CAP_DR1,
};

4.4. AVStream/AVIStream/AVCodecParameters

AVStream 结构表示当前媒体流的上下文context，着重于所有媒体流共有的属性(并且是在程序运行时才能确定其值)和关联其他结构的字段。

codecpar 字段关联当前音视频媒体使用的编解码器； priv_data 字段关联解析各个具体媒体流与文件容器有关的独有的属性；还有一些媒体帧索引和时钟信息

与之前两个不同，这类结构体没有定义函数，都是stream属性


// demux时候获取stream
avformat_open_input
	-> s->iformat->read_header(s))
    
typedef struct AVStream {
    void *priv_data; //AVIStream
	AVCodecParameters *codecpar;// 编码器的参数
}

4.5. AVPacket/AVPacketList/AVFrame

typedef struct AVPacket {
    /**
     * A reference to the reference-counted buffer where the packet data is
     * stored.
     * May be NULL, then the packet data is not reference-counted.
     */
    AVBufferRef *buf;
} AVPacket;
// AVPacket 代表音视频数据帧，固有的属性是一些标记，时钟信息，和压缩数据首地址，大小等信息

typedef struct AVPacketList {
    AVPacket pkt;
    struct AVPacketList *next;
} AVPacketList;
// AVPacketList 把音视频AVPacket 组成一个小链表

typedef struct AVFrame {
} AVFrame;
// 解码后的一帧数据

5. AVPacket/AVFrame 内存模型

对于多个AVPacket共享同一个缓存空间，FFmpeg使用的引用计数的机制（reference-count）
初始化引用计数为0，只有真正分配AVBuffer的时候，引用计数初始化为1
当有新的Packet引用共享的缓存空间时，就将引用计数 +1
当释放了引用共享空间的Packet，就将引用计数-1
引用计数为0时，就释放掉引用的缓存空间AVBuffer。
AVFrame也是采用同样的机制

5.1. AVPacket 常用API

AVPacket*av_packet_alloc(void) 分配AVPacket 这个时候和buffer没有关系
void av_packet_free(AVPacket**pkt)释放AVPacket 和_alloc对应
void av_init_packet(AVPacket*pkt)初始化AVPacket 只是单纯初始化pkt字段
int av_new_packet(AVPacket*pkt, int size)给AVPacket的buf分配内存，引用计数初始化为1
int av_packet_ref(AVPacket*dst, const AVPacket*src)增加引用计数
void av_packet_unref(AVPacket*pkt)减少引用计数
void av_packet_move_ref(AVPacket*dst, AVPacket*src)转移引用计数
AVPacket *av_packet_clone(const AVPacket *src)等于 av_packet_alloc()+av_packet_ref()

5.2. AVFrame 常用API

AVFrame *av_frame_alloc(void)分配AVFrame
void av_frame_free(AVFrame **frame)释放AVFrame
int av_frame_ref(AVFrame *dst, const AVFrame *src)增加引用计数
void av_frame_unref(AVFrame *frame)减少引用计数
void av_frame_move_ref(AVFrame *dst, AVFrame *src)转移引用计数
int av_frame_get_buffer(AVFrame *frame, int align)根据AVFrame分配内存
AVFrame *av_frame_clone(const AVFrame *src)等于 av_frame_alloc()+av_frame_ref()