camera使用C2D方法进行YUV转RGB耗时较久
在做车载项目的camera模块时, 因要显示或者其他原因,需要将从驱动过来的YUV数据转换成RGB,这个时候使用了高通的C2D的库,进行转换. 但是在转换的过程中计算了调用C2D的方法后,耗时比较严重,时间大部分超过了30ms, 导致后面渲染变慢, 以及前面因为是30ms取一次帧,中间会间隔30+30ms多才去获取驱动帧,这两个原因最终使camera的画面出现了卡滞的感觉.
尺寸转换信息是 720X480 的 YUV 转换成 1140X720 的 RGB
这个是添加打开C2Dlog的操作:
please follow below to enable c2d logs mkdir c2d_config.txt with content and put it in the dir as below
#这个是添加打开C2Dlog的操作
/data/vendor/gpu/c2d_config.txt
logMasks 0x00000000FF
64606:06-07 09:38:48.924 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>1<<<
64607:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:383>: Target info
64608:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:384>: TId:[0xC2D00001] TC:[0x00000000]
64610:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:411>: Object #0
64612:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:415>: SId:[0xC2D00009] CM:[0x00000000]
64614:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:421>: TRect: x:[0x00000000 (0.000000)] y:[0x00000000 (0.000000)] w:[0x00000000 (0.000000)] h:[0x00000000 (0.000000)]
64615:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:428>: SRect: x:[0x00000000 (0.000000)] y:[0x00000000 (0.000000)] w:[0x00000000 (0.000000)] h:[0x00000000 (0.000000)]
64616:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_log_objects:435>: ScRect: x:[0] y:[0] w:[0] h:[0] Global_Alpha:[0/255]
64617:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_surface_draw:2694>: C2D Draw surface 0xc2d00001
64619:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_pipeline_init_vbo:946>: xy_coords: ( (-1.00000000, -1.00000000):(-1.00000000, 1.00000000):(1.00000000, 1.00000000):(1.00000000, -1.00000000) ) uv_coords: ( (0.00000000, 1.00000000):(0.00000000, 0.00000000):(1.00000000, 0.00000000):(1.00000000, 1.00000000) ) uv_dst_coords: ( (0.00000000, 1.00000000):(0.00000000, 0.00000000):(1.00000000, 0.00000000):(1.00000000, 1.00000000) ) clipping rect: ( (0, 0):(1140, 720) ) xy_coords_bin: ( (-1.00000000, -1.00000000):(-1.00000000, 1.00000000):(1.00000000, 1.00000000):(1.00000000,
64620:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1930>: Starting GPU ADDRESS 0x7fffa0000
64621:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:0 Offset:0x0 Size:0x100
64623:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:1 Offset:0x100 Size:0x100
64624:06-07 09:38:48.924 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:2 Offset:0x200 Size:0x100
64625:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:3 Offset:0x300 Size:0x100
64626:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:4 Offset:0x400 Size:0x100
64627:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:5 Offset:0x500 Size:0x100
64628:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:6 Offset:0x600 Size:0x100
64629:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:7 Offset:0x700 Size:0x100
64630:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:8 Offset:0x800 Size:0x100
64631:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:9 Offset:0x900 Size:0x400
64632:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:10 Offset:0xd00 Size:0x100
64633:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:11 Offset:0xe00 Size:0x100
64635:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:12 Offset:0xf00 Size:0x100
64636:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:13 Offset:0x1000 Size:0x200
64638:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:15 Offset:0x1200 Size:0x100
64639:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:16 Offset:0x1300 Size:0x100
64640:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:17 Offset:0x1400 Size:0x5000
64641:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:18 Offset:0x6400 Size:0x800
64642:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:19 Offset:0x0 Size:0x0
64643:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:21 Offset:0x0 Size:0x0
64645:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:20 Offset:0x0 Size:0x0
64646:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:22 Offset:0x0 Size:0x0
64648:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:23 Offset:0x0 Size:0x0
64649:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_draw:1935>: ID:0 Offset:0x0 Size:0x0
64650:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_pipeline_batch_submit:1182>: ===> submit (0x7fffa6400) with size (1964)
64653:06-07 09:38:48.925 I/Adreno-C2D( 592): <c2d_gsl_submit_bytestream:859>: ===>Cxt[3] submitted (0x7fffa6400)(0x7fffa6400) with size (1964) - timestamp (752)
64654:06-07 09:38:48.925 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>2<<<
64655:06-07 09:38:48.926 I/Adreno-C2D( 592): <c2d_surface_flush:2902>: C2D Flush surface 0xc2d00001
64656:06-07 09:38:48.926 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>3<<<
64657:06-07 09:38:48.926 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>4<<<
64659:06-07 09:38:48.926 I/Adreno-C2D( 592): <c2d_surface_wait:3067>: C2D surface wait timestamp 0xc2d00001
64660:06-07 09:38:48.926 I/Adreno-C2D( 592): <c2d_surface_wait_till_pipelines_done:3023>: C2D surface wait 0xa9446000
64668:06-07 09:38:48.928 I/Adreno-C2D( 592): <c2d_surface_copy_from_internal_buf:384>: Local_buffer:[0xffc45000] Dst_buffer:[0xa79e9000] Size:[0x321900] Dst_stride:[0x11d0] Dst_height: [0x2d0] Local_buffer_stride: [0x1200]
64881:06-07 09:38:48.976 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>5<<<
64882:06-07 09:38:48.976 E/EvsAISDriverSurface( 592): [Qdebug] graphics_draw_c2d_uyvy >>>6<<<
64883:06-07 09:38:48.976 D/EvsAISDriverSurface( 592): GL_DRM - TIME_PROFILE : graphics_draw_c2d_uyvy total delta_us 52525
可以看到[Qdebug] graphics_draw_c2d_uyvy >>>5<<<
和[Qdebug] graphics_draw_c2d_uyvy >>>4<<<
直接的时间将近50ms,中间调用的C2D的方法是c2d_status = c2dWaitTimestamp(c2d_timestamp);
原因如下:
I found stride is not multiple of 32 byte for target surface(RGB 1140x720)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
01-01 08:00:03.828 I/Adreno-C2D( 592): <c2d_surface_align_stride:242>: input stride is 4560 bytes
01-01 08:00:03.828 I/Adreno-C2D( 592): <c2d_surface_align_stride:267>: final stride is 4608 bytes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Can you please try again after changing stride to be 32 byte aligned numbers? (4560 is not 32byte aligned numbers)
I think this is because width is 1140, which is not multiple of 32.
For this test, please try again after changing width to 1152 instead of 1140.
i.e.)
c2d_rgb_surface_def.width = 1152; //user_ctxt[idx].width;
c2d_rgb_surface_def.stride0 = 4608; //stride;
If not, c2d will do "memcpy" to prepare surface internally.
I have checked code again and again and found c2d is checking whether stride is aligned to 64-byte.
If it is not aligned to 64-byte, it will try to run "memcpy" to handle surfaces.
So, simply you just change input stride value at your code.
It means you can use 4608 instead of 4560 for RGB surface and 1472 instead of 1440 for YUV surface and no need to change width.
I believe above actions will show you to see the same values in "input stride" and "final stride" at "c2d_surface_align_stride" log.
原因是他们进行转换的时候, 会计算stride value, 如果不相等或者不是64位的就会调用memcpy这个函数处理surface,导致耗时增加. 可以从log里看到c2d_surface_copy_from_internal_buf
这个C2D库的方法打印. 因此他们建议将RGB的宽1140更改为1152,或者将YUV的宽度720改为736也可以. 我这边修改的是RGB的宽度,因为YUV那边跟驱动有关不好更改尺寸. 当修改完成后, 耗时变小都是在4-5ms直接,卡顿现象也消失了
stride valuse的计算方式如一下:
Regarding calculating equation of stride, you can simply check whether the following values are same or not.
06-07 11:30:46.211 I/Adreno-C2D( 588): <c2d_surface_align_stride:242>: input stride is 1440 bytes
06-07 11:30:46.211 I/Adreno-C2D( 588): <c2d_surface_align_stride:267>: final stride is 1472 bytes
If you see difference between "input stride" and "final stride" when you use C2D, you can expect "memcpy" will do in C2D.
If you wish to avoid "memcpy" operation, you have to make the same each other somehow.
The equation is different depending on the format in both source and target surfaces.
For ARGB8888:
s = width * 32 / 8
Stride = (s + 63) & ~63
For example, width=1140,
s = 1140*32/8=4560
Stride = (4560 + 63)&~63 = 4608
For YUV422:
p = width & 0x1
s = (width + p) * 2
Stride = (s + 63) & ~63
For example, width=720
p = 720&0x1=0
s = (720+0)*2=1440
Stride = (1440 + 63)&~63 = 1472