提取视频关键帧和关键帧的时间点信息

这篇文章的原文地址是:http://www.videoproductionslondon.com/blog/scene-change-detection-during-encoding-key-frame-extraction-code


里面主要介绍了如何提取视频关键帧,以及视频关键帧的时间点信息,这样就能做出各大视频网站的那种:点击视频下方进度条,显示该时间点附近关键帧图片的效果。

老外还是太牛,真心是佩服。


部分代码:

<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">ffmpeg -vf select="eq(pict_type\,PICT_TYPE_I)" -i yourvideo.mp4 -vsync 2 -s 73x41 -f image2 thumbnails-%02d.jpeg</span> ffmpeg的-vf选择=“EQ(pict_type \,PICT_TYPE_I)”-i yourvideo.mp4 -vsync 2 -s -f 73x41 IMAGE2 thumbnails-%02d.jpeg</span>
<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">-loglevel debug 2>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 - > keyframe-timecodes.txt</span> -loglevel调试2>&1 | grep的“pict_type:我 - >选择:1”|切-d“”-f 6  - >关键帧timecodes.txt</span>


<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">[select @ 0000000001A88BE0] n:0 pts:0 t:0.000000 pos:1953 interlace_type:P key:0 pict_type:I -> select:1.000000</span> [选择@ 0000000001A88BE0] N:0分:0 T:0.000000 POS:1953年interlace_type:P键:0 pict_type:我 - >选择:1.000000</span>
<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">[select @ 0000000001A88BE0] n:1 pts:40000 t:0.040000 pos:4202 interlace_type:P key:0 pict_type:P -> select:0.000000</span> [选择@ 0000000001A88BE0] N:1点:40000电话:0.040000 POS:4202 interlace_type:P键:0 pict_type:P  - >选择:0.000000</span>

<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">t:0.000000</span> T:0.000000</span>
<span class="notranslate" onmouseover="_tipon(this)" onmouseout="_tipoff()"><span class="google-src-text" style="direction: ltr; text-align: left">t:1.360000</span> T:1.360000</span>

< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< a onclick = "jwplayer().seek(0); return false" href = "#" >< img src = "thumbnails-01.jpeg" ></ a ></ span > < a onclick = "jwplayer().seek(0);返回false" href = "#" > < IMG SRC =“缩略图-01.jpeg”> </ A ></ span >
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< a onclick = "jwplayer().seek(1.36); return false" href = "#" >< img src = "thumbnails-02.jpeg" ></ a ></ span > < a onclick = "jwplayer().seek(1.36);返回false" href = "#" > < IMG SRC =“缩略图-02.jpeg”> </ A ></ span >



Scene change detection during encoding and key frame extraction code

In a previous post we explained how to generate thumbnails for each scene change using Edit Decision Lists from your non-linear editor of choice. Sometimes, though, you won't have an EDL along with your video assets (for instance, when you are using off-airs or videos edited by somebody else). In those situations we can still create a thumbnail for each scene change thanks to how frames are structured in digital compression.

According to Wikipedia, a GOP structure specifies the order in which intra- and inter-frames are arranged. A Group Of Pictures can contain the following frame types:

  • I-frame (intra coded picture) - reference picture, which represents a full image and which is independent of other picture types. Each GOP begins with this type of picture.
  • P-frame (predictive coded picture) - contains motion-compensated difference information from the preceding I- or P-frame.
  • B-frame (bidirectionally predictive coded picture) - contains difference information from the preceding and following I- or P-frame within a GOP.

The GOP structure is often referred by two numbers, for example M=3, N=12, which equals IBBPBBPBBPBBI. The first one tells the distance between two anchor frames (I or P) and the second the distance between two I-frames (GOP length).

In order to use key frames as a scene change detection method we need to use flexible GOP structures, with minimum (min-keynt) and maximum (keyint) values when encoding our video.

For example, a minimum setting that is the same as the frame rate of the video will prevent the encoded video from having two subsequent key frames within a second of each other. Please note that if your video has scenes shorter than a second long you won't be able to detect all scene changes unless you reduce this setting.

Similarly, a maximum setting ensures that a key frame is inserted at least every X number of frames. A recommend setting is to set this as 10 times the frame rate, which equates to 10 seconds of video between key frames. We can set this to infinite to never insert non-scenecut key frames although this might cause problems when seeking (if you try to skip to a part of the video without a key frame, there won't be any video until the next key frame is reached).

In addition, we need to define the threshold for scenecut detection. The encoder calculates a metric for every frame to estimate how different it is from the previous frame. If the value is lower than the threshold, a scenecut is detected.

Once we have encoded the video we can then run the following ffmpeg (I've used 0.8-win64-static) parameters:

ffmpeg -vf select="eq(pict_type\,PICT_TYPE_I)" -i yourvideo.mp4 -vsync 2 -s 73x41 -f image2 thumbnails-%02d.jpeg
-loglevel debug 2>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 - > keyframe-timecodes.txt

What follows -vf in a ffmpeg command line is a Filtergraph description. The select filter selects frames to pass in output. The constant of the filter is “pict_type” and the value “ PICT_TYPE_I”. In short, we are only passing key frames to the output.

-vsync 2 prevents ffmpeg to generate more than one copy for each key frame.

-f image2 writes video frames to image files. The output filenames are specified by a pattern, which can be used to produce sequentially numbered series of files. The pattern may contain the string "%d" or "%0Nd".

-loglevel debug 2 > keyframe-timecodes.txt outputs:

[select @ 0000000001A88BE0] n:0 pts:0 t:0.000000 pos:1953 interlace_type:P key:0 pict_type:I -> select:1.000000
[select @ 0000000001A88BE0] n:1 pts:40000 t:0.040000 pos:4202 interlace_type:P key:0 pict_type:P -> select:0.000000

I use “>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 -” to output something more readable:

t:0.000000
t:1.360000

Finally, I can convert “keyframe-timecodes.txt” into a chapter navigation list and use the thumbnails to navigate the video:

<a οnclick="jwplayer().seek(0); return false" href="#"><img src="thumbnails-01.jpeg"></a>
<a οnclick="jwplayer().seek(1.36); return false" href="#"><img src="thumbnails-02.jpeg"></a>

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 7
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值