ffmpeg 像素格式基础知识

hjjdebug

已于 2022-09-20 16:01:19 修改

阅读量2.7k

点赞数

分类专栏： # ffmpeg 文章标签： ffmpeg 像素格式

于 2022-08-04 14:51:58 首次发布

本文链接：https://blog.csdn.net/hejinjing_tom_com/article/details/126159464

版权

ffmpeg 专栏收录该内容

74 篇文章

订阅专栏

ffmpeg 像素格式基础知识

----------------------------------------
甲. YUV 颜色空间命名含义
----------------------------------------
Y分量是不压缩的，对UV色度分量进行一定程度的压缩。
YUV4:4:4
YUV4:4:0
YUV4:2:2
YUV4:2:0
YUV4:1:1
YUV4:1:0
具体的含义是什么？

考虑一个4*2的色块，共8个像素，2行（奇数行，偶数行）
第一个值总是被称为4，代表亮度在奇数行偶数行都被采样4次，总是被采样。
第二个值被称为4，2或1，代表色度数据第一行被采样4次，2次或1次，此处的色度代表U或V
第三个值被称为4，2或1,0，代表色度数据第二行被采样4次，2次或1次，或0次.

同一种采样方式，数据在内存中的存储又分三种方式：
1.packed: 按YUYVYUYV。。。交织在一起存放
2.planar: 按YYYYY。。。，UUUUU。。。，VVVV。。。分开存放
3.semi_planar: 按YYYY.... UVUVUV....分开存放

定义1：AVPixelFormat: 定义了像素格式 ID

enum AVPixelFormat {
    AV_PIX_FMT_NONE = -1,
    AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
    AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
    AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...
    AV_PIX_FMT_BGR24,     ///< packed RGB 8:8:8, 24bpp, BGRBGR...
    AV_PIX_FMT_YUV422P,   ///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)
   ...               //忽略
   };

定义2：AVPixFmtDescriptor: 此数据结构定义了图像数据在内存中的组织排列形式

(gdb) ptype av_pix_fmt_descriptors
type = const struct AVPixFmtDescriptor {
      const char *name;
      uint8_t nb_components;
      uint8_t log2_chroma_w;
      uint8_t log2_chroma_h;
      uint64_t flags;
      AVComponentDescriptor comp[4];
      const char *alias;
} [198]
这198种像素定义我就不copy了。可参看一下后面“乙部分描述"

下面是其成员的描述：
const char *name : 像素格式名称
uint8_t nb_components: 像素分量数，取值范围 1 - 4。

例如 AV_PIX_FMT_GRAY8 只有 Y 一个分量，
AV_PIX_FMT_YUV420P 有 Y、U、V 三个分量，
AV_PIX_FMT_NV12 也有 Y、U、V 三个分量，
AV_PIX_FMT_ARGB 有 A、R、G、B 四个分量。

uint8_t log2_chroma_w:水平方向色度子采样因子。

右移的位数. 表示将亮度样本宽度右移多少位能得到色度样本的宽度.
例如对于 yuv420p 格式，若图像分辨率为 1280 x 720，
则亮度样本宽度(水平方向亮度样本数)为 1280，
色度样本宽度(水平方向色度样本数)为 1280/2 = 640，
则 log2_chroma_w 值为 1(右移 1 位)。

uint8_t log2_chroma_h: 垂直方向色度子采样因子。

uint64_t flags: 像素格式标志位组合，
例如如 AV_PIX_FMT_FLAG_BE ｜ AV_PIX_FMT_FLAG_HWACCEL 。
标志 AV_PIX_FMT_FLAG_BE 表示大端格式.
AV_PIX_FMT_FLAG_HWACCEL 表示此像素格式用于硬解或硬编等硬件加速场合。

AVComponentDescriptor comp[4]
这个成员非常重要。数组的每个元素表示一个分量，注意是一个分量而不是一个 plane，一个 plane 可能含有多个分量

定义3：AVComponentDescriptor, AV分量描述符

(gdb) ptype av_pix_fmt_descriptors[0].comp
type = struct AVComponentDescriptor {
      int plane;
      int step;
      int offset;
      int shift;
      int depth;
      int step_minus1;       // deprecated
      int depth_minus1;       // deprecated
      int offset_plus1;       // deprecated
} [4]
AVComponentDescriptor 定义了每个分量在内存中的实际组织形式，包含所有细节。

包含成员如下，
int plane : 当前分量位于哪个 plane 中。

例如 p010 格式有三个分量：Y、U、V，两个 plane：Y、UV。
Y plane 的形式为YYYY...，UV plane 的形式为UVUVUV...。
Y 分量的 plane 值是 0， U 分量和 V 分量的 plane 值是 1，U 样本和 V 样本交织存放在 plane 1中。

int step : 步长，表示水平方向连续的两个样本间距是多少个字节(或比特)，
如果像素格式是比特流格式(标志 AV_PIX_FMT_FLAG_BITSTREAM 有效)，此值表示比特数，否则此值表示字节数。

例如: p010 格式，Y plane 的形式为YYYY...，UV plane 的形式为UVUVUV...，位深是 10，
考虑对齐后，每一个 Y、每一个 U、每一个 V 都占 2 个字节，
因此 Y 分量的 step 是 2(两个 Y 相距两字节)，
U 分量的 step 是 4(两个 U 相距 4 字节)，
V 分量的 step 也是 4(两个 V 相距 4 字节)。

int offset : 偏移，表示在当前 plane 中，当前分量的第一个样本之前有多少个字节的数据，
如果像素格式是比特流格式(标志 AV_PIX_FMT_FLAG_BITSTREAM 有效)，此值表示比特数，否则此值表示字节数。

例如 p010 格式，每一个 U 或 V 都占 2 个字节，第一个 V 样本前有 2 个字节被 U 样本占了，
所以 U 分量的 offset 值是 0，V 分量的 offset 值是 2。

int shift : 右移位数，表示将对应内存单元的值右移多少位可以得到实际值。

例如 p010 格式，位深是 10，而内存对齐后每一个 Y、U、V 样本占 16 bit，
那么 10 位的数据放在 16 位的内存单元中，是占据高 10 位还是占据低 10 位，即是由 shift 值决定的。
p010 格式中，各分量的 shift 值都是 6 ，表示数据放在高 10 位。

从 Y plane 中获取第一个 Y 样本的值，示意代码如下：
uint8_t y_plane[1280*2];
uint16_t *p_y0 = (uint16_t *)y_plane;
uint16_t y0 = (*p_y0) >> 6;

int depth: 当前分量每个样本的位宽度，即位深。
上述参数中，
plane 表示分量所在的 plane 的序号，
offset 表示多个分量交织存放在同一个 plane 中时的排列顺序(如 p010 格式的 UV plane 中 U 在前 V 在后)，
step、shift 和 depth 则是和内存对齐相关

后面的3个成员已经不推荐使用了，从略。

----------------------------------------
乙： ffmpeg 中使用的图像pix_fmts
----------------------------------------

$ffmpeg -pix_fmts , 可以查看所有的pix_fmts,
它关键是定义了下面这张表

static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = {
    [AV_PIX_FMT_YUV420P] = {
        .name = "yuv420p",
        .nb_components = 3,       //分量3个YUV
        .log2_chroma_w = 1,       //色宽小1倍 (>>1)
        .log2_chroma_h = 1,       //色高小1倍 (>>1)
        .comp = {
            { 0, 1, 0, 0, 8, 0, 7, 1 },        /* Y */ 亮深是8bit, 后3项不用了，前4项plane,step,offset,shift
            { 1, 1, 0, 0, 8, 0, 7, 1 },        /* U */ 色深1是8bit
            { 2, 1, 0, 0, 8, 0, 7, 1 },        /* V */ 色深2是8bit
        },
        .flags = AV_PIX_FMT_FLAG_PLANAR,
    },
    [AV_PIX_FMT_YUYV422] = {
        .name = "yuyv422",
        .nb_components = 3,    //分量3个YUV
        .log2_chroma_w = 1,     //色宽小1倍 (>>1)
        .log2_chroma_h = 0,     //色高正常 (>>0)
        .comp = {
            { 0, 2, 0, 0, 8, 1, 7, 1 },        /* Y */
            { 0, 4, 1, 0, 8, 3, 7, 2 },        /* U */
            { 0, 4, 3, 0, 8, 3, 7, 4 },        /* V */
        },
    },
   .....
}

----------------------------------------
丙： ffmpeg 中使用的音频sample_fmt
----------------------------------------
$ffmpeg -sample_fmts 可以查看所定义的所有的音频采样格式。

关键是在samplefmt.c中，定义了下面这张表
static const SampleFmtInfo sample_fmt_info[AV_SAMPLE_FMT_NB] = {
    [AV_SAMPLE_FMT_U8]   = { .name =   "u8", .bits = 8, .planar = 0, .altform = AV_SAMPLE_FMT_U8P },
    [AV_SAMPLE_FMT_S16] = { .name = "s16", .bits = 16, .planar = 0, .altform = AV_SAMPLE_FMT_S16P },
    [AV_SAMPLE_FMT_S32] = { .name = "s32", .bits = 32, .planar = 0, .altform = AV_SAMPLE_FMT_S32P },
    [AV_SAMPLE_FMT_S64] = { .name = "s64", .bits = 64, .planar = 0, .altform = AV_SAMPLE_FMT_S64P },
    [AV_SAMPLE_FMT_FLT] = { .name = "flt", .bits = 32, .planar = 0, .altform = AV_SAMPLE_FMT_FLTP },
    [AV_SAMPLE_FMT_DBL] = { .name = "dbl", .bits = 64, .planar = 0, .altform = AV_SAMPLE_FMT_DBLP },
    [AV_SAMPLE_FMT_U8P] = { .name = "u8p", .bits = 8, .planar = 1, .altform = AV_SAMPLE_FMT_U8   },
    [AV_SAMPLE_FMT_S16P] = { .name = "s16p", .bits = 16, .planar = 1, .altform = AV_SAMPLE_FMT_S16 },
    [AV_SAMPLE_FMT_S32P] = { .name = "s32p", .bits = 32, .planar = 1, .altform = AV_SAMPLE_FMT_S32 },
    [AV_SAMPLE_FMT_S64P] = { .name = "s64p", .bits = 64, .planar = 1, .altform = AV_SAMPLE_FMT_S64 },
    [AV_SAMPLE_FMT_FLTP] = { .name = "fltp", .bits = 32, .planar = 1, .altform = AV_SAMPLE_FMT_FLT },
    [AV_SAMPLE_FMT_DBLP] = { .name = "dblp", .bits = 64, .planar = 1, .altform = AV_SAMPLE_FMT_DBL },
};

(gdb) ptype sample_fmt_info
type = const struct SampleFmtInfo {
      char name[8];
      int bits;
      int planar;
      enum AVSampleFormat altform;
} [12]