FFMPEG入门基础知识笔记一

最新推荐文章于 2024-04-11 18:55:52 发布

DarwinLong

最新推荐文章于 2024-04-11 18:55:52 发布

阅读量853

点赞数

分类专栏： Android 音视频

本文链接：https://blog.csdn.net/darwinlong/article/details/78895405

版权

Android 同时被 2 个专栏收录

13 篇文章 0 订阅

订阅专栏

音视频

4 篇文章 0 订阅

订阅专栏

官方文档： http://ffmpeg.org/doxygen/2.0/index.html

AVCodecContext配置参数

1.基本API

AVFormatContex * pFormatCtxEnc;

AVCodecContext * pCodecCtxEnc;
AVStream * video_st;
AVOutputFormat * pOutputFormat;
//根据文件后缀来猜测文件的格式
pOutputFormat = av_guess_format(NULL,

,NULL);

pFormatCtxEnc = avformat_alloc_context();
pFormatCtxEnc->oformat = pOutputFormat;
video_st = avformat_new_stream(pFormatCtxEnc,0);

2、开始配置编码器上下文的参数

pCodecCtxEnc = video_st->codec;
//编码器的ID号，这里我们自行指定为264编码器，实际上也可以根据video_st里的codecID 参数赋值

pCodecCtxEnc->codec_id = AV_CODEC_ID_H264;
//编码器编码的数据类型

pCodecCtxEnc->codec_type = AVMEDIA_TYPE_VIDEO;

//目标的码率，即采样的码率；显然，采样码率越大，视频大小越大

pCodecCtxEnc->bit_rate = 200000;

//固定允许的码率误差，数值越大，视频越小

pCodecCtxEnc->bit_rate_tolerance = 4000000;

//编码目标的视频帧大小，以像素为单位

pCodecCtxEnc->width = 640;

pCodecCtxEnc->height = 480;

//帧率的基本单位，我们用分数来表示，

//用分数来表示的原因是，有很多视频的帧率是带小数的eg：NTSC 使用的帧率是29.97

pCodecCtxEnc->time_base.den = 30;
pCodecCtxEnc->time_base.num = 1;
//pCodecCtxEnc->time_base = (AVRational){1,25};
//像素的格式，也就是说采用什么样的色彩空间来表明一个像素点

pCodecCtxEnc->pix_fmt = PIX_FMT_YUV420P;

//每250帧插入1个I帧，I帧越少，视频越小

pCodecCtxEnc->gop_size = 250;

//两个非B帧之间允许出现多少个B帧数

//设置0表示不使用B帧

//b 帧越多，图片越小

pCodecCtxEnc->max_b_frames = 0;
//运动估计

pCodecCtxEnc->pre_me = 2;
//设置最小和最大拉格朗日乘数

//拉格朗日乘数是统计学用来检测瞬间平均值的一种方法

pCodecCtxEnc->lmin = 1;
pCodecCtxEnc->lmax = 5;
//最大和最小量化系数量化系数越小视频越清晰参考文章
《基于HEVC编码的无参考PSNR算法》

pCodecCtxEnc->qmin = 10;
pCodecCtxEnc->qmax = 50;
//因为我们的量化系数q是在qmin和qmax之间浮动的，

//qblur表示这种浮动变化的变化程度，取值范围0.0～1.0，取0表示不削减

pCodecCtxEnc->qblur = 0.0;

//空间复杂度的masking力度，取值范围 0.0-1.0

pCodecCtxEnc->spatial_cplx_masking = 0.3;

//运动场景预判功能的力度，数值越大编码时间越长

pCodecCtxEnc->me_pre_cmp = 2;

//采用（qmin/qmax的比值来控制码率，1表示局部采用此方法，）

pCodecCtxEnc->rc_qsquish = 1;

//设置 i帧、p帧与B帧之间的量化系数q比例因子，这个值越大，B帧越不清楚

//B帧量化系数 = 前一个P帧的量化系数q * b_quant_factor + b_quant_offset

pCodecCtxEnc->b_quant_factor = 1.25;
//i帧、p帧与B帧的量化系数便宜量，便宜越大，B帧越不清楚

pCodecCtxEnc->b_quant_offset = 1.25;
//p和i的量化系数比例因子，越接近1，P帧越清楚

//p的量化系数 = I帧的量化系数 * i_quant_factor + i_quant_offset

pCodecCtxEnc->i_quant_factor = 0.8;

pCodecCtxEnc->i_quant_offset = 0.0;

//码率控制测率，宏定义，查API

pCodecCtxEnc->rc_strategy = 2;

//b帧的生成策略

pCodecCtxEnc->b_frame_strategy = 0;

//消除亮度和色度门限

pCodecCtxEnc->luma_elim_threshold = 0;
pCodecCtxEnc->chroma_elim_threshold = 0;
//DCT变换算法的设置，有7种设置，这个算法的设置是根据不同的CPU指令集来优化的取值范围在0-7之间

pCodecCtxEnc->dct_algo = 0;
//这两个参数表示对过亮或过暗的场景作masking的力度，0表示不作

pCodecCtxEnc->lumi_masking = 0.0;
pCodecCtxEnc->dark_masking = 0.0;
3、一些针对具体要求进行的配置

（1）x264编码时延问题

方法一：

vcodec_encode_video2函数输出的延时仅仅跟max_b_frames的设置有关，

想进行实时编码，将max_b_frames设置为0便没有编码延时了
方法二：
1、使用264的API设置编码速度

av_opt_set(m_context->priv_data,"preset","ultrafast",0);

// Set Option

AVDictionary *param = 0;

//H.264

if(pCodecCtx->codec_id == AV_CODEC_ID_H264) {

av_dict_set(&param, "preset", "slow", 0);

av_dict_set(&param, "tune", "zerolatency", 0);

//av_dict_set(&param, "profile", "main", 0);

}

//H.265

if(pCodecCtx->codec_id == AV_CODEC_ID_H265){

av_dict_set(&param, "preset", "ultrafast", 0);

av_dict_set(&param, "tune", "zero-latency", 0);

}

参考文章

http://blog.csdn.net/chance_yin/article/details/16335625

Video[edit] 码率大小设置

16 kbit/s – videophone quality (minimum necessary for a consumer-acceptable "talking head" picture using various video compression schemes)
128–384 kbit/s – business-oriented videoconferencing quality using video compression
400 kbit/s YouTube 240p videos (using H.264)^[20]
750 kbit/s YouTube 360p videos (using H.264)^[20]
1 Mbit/s YouTube 480p videos (using H.264)^[20]
1.15 Mbit/s max – VCD quality (using MPEG1 compression)^[21]
2.5 Mbit/s YouTube 720p videos (using H.264)^[20]
3.5 Mbit/s typ – Standard-definition television quality (with bit-rate reduction from MPEG-2 compression)
3.8 Mbit/s YouTube 720p (at 60fps mode) videos (using H.264)^[20]
4.5 Mbit/s YouTube 1080p videos (using H.264)^[20]
6.8 Mbit/s YouTube 1080p (at 60 fps mode) videos (using H.264)^[20]
9.8 Mbit/s max – DVD (using MPEG2 compression)^[22]
8 to 15 Mbit/s typ – HDTV quality (with bit-rate reduction from MPEG-4 AVC compression)
19 Mbit/s approximate – HDV 720p (using MPEG2 compression)^[23]
24 Mbit/s max – AVCHD (using MPEG4 AVC compression)^[24]
25 Mbit/s approximate – HDV 1080i (using MPEG2 compression)^[23]
29.4 Mbit/s max – HD DVD
40 Mbit/s max – 1080p Blu-ray Disc (using MPEG2, MPEG4 AVC or VC-1 compression)^[25]
250 Mbit/s max – DCP (using JPEG 2000 compression)
1.4 Gbit/s – 10-bit 4:4:4 Uncompressed 1080p at 24fps

 
          AVFormatContext

 
          在使用FFMPEG进行开发的时候，AVFormatContext是一个贯穿始终的数据结构，很多函数都要用到它作为参数。它是FFMPEG解封装（flv，mp4，rmvb，avi）功能的结构体。下面看几个主要变量的作用（在这里考虑解码的情况），AVFormatContext：

 
struct AVInputFormat *iformat：输入数据的封装格式
AVIOContext *pb：输入数据的缓存
unsigned int nb_streams：视音频流的个数
AVStream **streams：视音频流
char filename[1024]：文件名
int64_t duration：时长（单位：微秒us，转换为秒需要除以1000000）
int bit_rate：比特率（单位bps，转换为kbps需要除以1000）
AVDictionary *metadata：元数据

视频的原数据（metadata）信息可以通过AVDictionary获取。元数据存储在AVDictionaryEntry结构体中，如下所示

 
    [cpp]  
    view plain 
     copy 
   
typedef struct AVDictionaryEntry {  
    char *key;  
    char *value;  
} AVDictionaryEntry;  

每一条元数据分为key和value两个属性。

在ffmpeg中通过av_dict_get()函数获得视频的原数据。

获取元数据：

//MetaData------------------------------------------------------------
//从AVDictionary获得
//需要用到AVDictionaryEntry对象
//CString author,copyright,description;
CString meta=NULL,key,value;
AVDictionaryEntry *m = NULL;
//使用循环读出
while(m=av_dict_get(pFormatCtx->metadata,"",m,AV_DICT_IGNORE_SUFFIX)){
key.Format(m->key);
value.Format(m->value);
meta+=key+"\t:"+value+"\r\n" ;
}

AVFormatContext中的

AVCodec *video_codec; AVCodec *audio_codec; AVCodec *subtitle_codec;

1. 提取视频（Extract Video）

2. 提取音频（Extract Audio）

3. 提取字幕（Extract Subtitle）

FFMPEG中结构体很多。

最关键的结构体可以分成以下几类：

a) 解协议（http,rtsp,rtmp,mms）

AVIOContext，URLProtocol，URLContext主要存储视音频使用的协议的类型以及状态。URLProtocol存储输入视音频使用的封装格式。每种协议都对应一个URLProtocol结构。（注意：FFMPEG中文件也被当做一种协议“file”）

b) 解封装（flv,avi,rmvb,mp4）

AVFormatContext主要存储视音频封装格式中包含的信息；AVInputFormat存储输入视音频使用的封装格式。每种视音频封装格式都对应一个AVInputFormat 结构。

c) 解码（h264,mpeg2,aac,mp3）

每个AVStream存储一个视频/音频流的相关数据；每个AVStream对应一个AVCodecContext，存储该视频/音频流使用解码方式的相关数据；每个AVCodecContext中对应一个AVCodec，包含该视频/音频对应的解码器。每种解码器都对应一个AVCodec结构。

d) 存数据

视频的话，每个结构一般是存一帧；音频可能有好几帧

解码前数据：AVPacket

解码后数据：AVFrame

还有PTS和DTS的具体含义？

A:时间戳一般是在编码的时候加入到媒体文件中的，所以在解码时可以从中分析出PTS。

int64_t pts; ///< presentation time stamp in time_base units --显示时间戳
int64_t dts; ///< decompression time stamp in time_base units 解码时间戳

把time_base降为10帧每秒。播放速度和正常速度接近。但是不知道FLV文件的帧率该设置多少合适。有没有一个权威的说法。

I帧和P帧（I表示关键帧，P表示预测帧）

AVRatioal的定义如下：

 
     typedef struct AVRational{
int num; //numerator  分子
int den; //denominator 分母
} AVRational; 
     ffmpeg提供了一个把AVRatioal结构转换成double的函数：

static inline double av_q2d(AVRational a)｛
/**
* Convert rational to double.
* @param a rational to convert
**/
    return a.num / (double) a.den;
}
 
     视频的显示和存放原理  
           
             https://www.cnblogs.com/yinxiangpei/articles/3892982.html
            
对于一个电影，帧是这样来显示的：I B B P。现在我们需要在显示B帧之前知道P帧中的信息。因此，帧可能会按照这样的方式来存储：IPBB。这就是为什么我们会有一个解码时间戳和一个显示时间戳的原因。解码时间戳告诉我们什么时候需要解码，显示时间戳告诉我们什么时候需要显示。所以，在这种情况下，我们的流可以是这样的：
PTS: 1 4 2 3
DTS: 1 2 3 4
Stream: I P B B
通常PTS和DTS只有在流中有B帧的时候会不同。  
   比如：解码后的数据是IBBP，那要将这个数据编码的话，编码后的数据保存的格式就是IPBB
    大家都知道一般解码出来的数据都是播放顺序，解码器是将编码顺序的数据重新按照解码后的播放顺序输出的。而编码器是把数据根据解码需要的顺序重新排序保存的。
    当然，以上情况只在有帧的情况下才有用，否则只有IP帧的话解码和编码的顺序是一样的

DTS和PTS 在AVPacket中定义：
        
        typedef struct AVPacket {
       
            /**
       
             * A reference to the reference-counted buffer where the packet data is
       
             * stored.
       
             * May be NULL, then the packet data is not reference-counted.
       
             */
       
            AVBufferRef *buf;
       
            /**  显示时间戳
       
             * Presentation timestamp in AVStream->time_base units; the time at which
       
             * the decompressed packet will be presented to the user.
       
             * Can be AV_NOPTS_VALUE if it is not stored in the file.
       
             * pts MUST be larger or equal to dts as presentation cannot happen before
       
             * decompression, unless one wants to view hex dumps. Some formats misuse
       
             * the terms dts and pts/cts to mean something different. Such timestamps
       
             * must be converted to true pts/dts before they are stored in AVPacket.
       
             */
       
            int64_t pts; 
       
            /**   编码时间戳
       
             * Decompression timestamp in AVStream->time_base units; the time at which
       
             * the packet is decompressed.
       
             * Can be AV_NOPTS_VALUE if it is not stored in the file.
       
             */
       
            int64_t dts;
       
            ......
       
        ｝
       
2、AVFrame

        typedef struct AVFrame {
    
        /**
        
        *
        
         Presentation timestamp in time_base units (
        
        time
        
         when frame should be shown 
        
        to
        
         user).
     
        */
        
    int64_t pts;

        /**
        
        *
        
         PTS copied from the AVPacket that was decoded 
        
        to
        
         produce this frame.
     
        */
        
    int64_t pkt_pts;

        /**
        
        *
        
         DTS copied from the AVPacket that triggered returning this frame. (
        
        if
        
         frame threading isn
        
        '
        
        t used)
        
        *
        
         This 
        
        is
        
         also the Presentation 
        
        time
        
         of this AVFrame calculated from
     
        *
        
         only AVPacket.dts values without pts values.
     
        */
        
    int64_t pkt_dts;

注意：
    AVFrame里面的pkt_pts和pkt_dts是拷贝自AVPacket，同样以AVStream->time_base为单位；而pts是为输出(显示)准备的，以AVCodecContex->time_base为单位)。//FIXME

音频和视频流都有一些关于以多快速度和什么时间来播放它们的信息在里面。音频流有采样，视频流有每秒的帧率。然而，如果我们只是简单的通过数帧和乘以帧率的方式来同步视频，那么就很有可能会失去同步。于是作为一种补充，在流中的包有种叫做DTS（解码时间戳）和PTS（显示时间戳）的机制。为了这两个参数，你需要了解电影存放的方式。像MPEG等格式，使用被叫做B帧（B表示双向bidrectional）的方式。另外两种帧被叫做I帧和P帧（I表示关键帧，P表示预测帧）。I帧包含了某个特定的完整图像。P帧依赖于前面的I帧和P帧并且使用比较或者差分的方式来编码。B帧与P帧有点类似，但是它是依赖于前面和后面的帧的信息的。这也就解释了为什么我们可能在调用avcodec_decode_video以后会得不到一帧图像。
ffmpeg中的时间单位AV_TIME_BASEffmpeg中的内部计时单位（时间基），ffmepg中的所有时间都是于它为一个单位，比如AVStream中的duration即以为着这个流的长度为duration个AV_TIME_BASE。AV_TIME_BASE定义为：

          #define         AV_TIME_BASE   1000000
         
AV_TIME_BASE_Qffmpeg内部时间基的分数表示，实际上它是AV_TIME_BASE的倒数。从它的定义能很清楚的看到这点：

          #define         AV_TIME_BASE_Q   (AVRational){1, AV_TIME_BASE}
         
AVRatioal的定义如下：

          typedef struct AVRational{
int num; //numerator
int den; //denominator
} AVRational;
         
ffmpeg提供了一个把AVRatioal结构转换成double的函数：

static inline double av_q2d(AVRational a)｛
/**
* Convert rational to double.
* @param a rational to convert
**/
    return a.num / (double) a.den;
}
           
现在可以根据pts来计算一桢在整个视频中的时间位置：

       timestamp(秒) = pts * av_q2d(st->time_base)
      
 计算视频长度的方法：

       time(秒) = st->duration * av_q2d(st->time_base)
      
 这里的st是一个AVStream对象指针。
时间基转换公式timestamp(ffmpeg内部时间戳) = AV_TIME_BASE * time(秒)
time(秒) = AV_TIME_BASE_Q * timestamp(ffmpeg内部时间戳)
所以当需要把视频跳转到N秒的时候可以使用下面的方法：

       int64_t timestamp = N * AV_TIME_BASE; 
2
av_seek_frame(fmtctx, index_of_video, timestamp, AVSEEK_FLAG_BACKWARD);
      
ffmpeg同样为我们提供了不同时间基之间的转换函数：

       int64_t av_rescale_q(int64_t a, AVRational bq, AVRational cq)
      
这个函数的作用是计算a * bq / cq，来把时间戳从一个时基调整到另外一个时基。在进行时基转换的时候，我们应该首选这个函数，因为它可以避免溢出的情况发生。
三、各个time_base之间转换

ffmpeg提供av_rescale_q函数用于time_base之间转换，av_rescale_q(a,b,c)作用相当于执行a*b/c，通过设置b,c的值，可以很方便的实现time_base之间转换。
例如：
1、InputStream(AV_TIME_BASE)到AVPacket(AVStream->time_base)

       static 
       
       int
       
        decode_video(InputStream 
       
       *
       
       ist, AVPacket 
       
       *
       
       pkt, 
       
       int
       
       *
       
       got_output)
{
pkt
       
       ->
       
       dts  
       
       =
       
        av_rescale_q(ist
       
       ->
       
       dts, AV_TIME_BASE_Q, ist
       
       ->
       
       st
       
       ->
       
       time_base);
      
2、AVPacket(AVStream->time_base)到InputStream(AV_TIME_BASE)

       static 
       
       int
       
        process_input_packet(InputStream 
       
       *
       
       ist, 
       
       const
       
        AVPacket 
       
       *
       
       pkt)
{

       if
       
        (pkt
       
       ->
       
       dts !
       
       =
       
        AV_NOPTS_VALUE) {
        ist
       
       ->
       
       next_dts 
       
       =
       
        ist
       
       ->
       
       dts 
       
       =
       
        av_rescale_q(pkt
       
       ->
       
       dts, ist
       
       ->
       
       st
       
       ->
       
       time_base, AV_TIME_BASE_Q);