Video Synopsis for IR Imagery论文翻译

最新推荐文章于 2020-07-27 11:52:46 发布

卡塞尔学院临时校长

最新推荐文章于 2020-07-27 11:52:46 发布

阅读量584

点赞数

分类专栏：视频浓缩论文文章标签：计算机视觉

本文链接：https://blog.csdn.net/weixin_43901214/article/details/106381885

版权

视频浓缩论文专栏收录该内容

3 篇文章 1 订阅

订阅专栏

写在前面：2016年的论文，有关红外的视频摘要论文，感觉这论文有些句子写的不太好懂，就是感觉语言表达的不够清晰，当然也有可能是我英语水平有问题= =，这论文看看就行也不是顶会啥的，但主要做的是红外视频的，比较新奇，后边公式多的地方也有没翻译的，对照着原论文凑活看吧。

原论文地址点这里
给个 DOI：10.1007/978-981-10-2104-6_21 sci-hub自己下也行。

Video Synopsis for IR Imagery Considering Video as a 3D Data Cuboid

视频摘要为红外图像考虑视频作为三维数据长方体

Abstract

Video synopsis is a way to transform a recorded video into a temporal compact representation. Surveillance videos generally contain huge amount of recorded data as there are a lot of inherent spatio-temporal redundancies in the form of segments having no activities; browsing and retrieval of such huge data has always remained an inconvenient job. We present an approach to video synopsis for IR imagery in which considered video is mapped into a temporal compact and chronologically analogous way by removing these inherent spatio-temporal redundancies significantly. A group of frames of video sequence is taken to form a 3D data cuboid with X, Y and T axes, this cuboid is re-represented as stack of contiguous X – T slices. With the help of Canny’s edge detection and Hough transform-based line detection, contents of these slices are analysed and segments having spatio-temporal redundancy are eliminated. Hence, recorded video is dynamically summarized on the basis of its content.

视频摘要是一种将录制的视频转换为时间压缩表示的方法。监控视频通常包含大量的记录数据，以片段无活动的形式存在大量的固有时空冗余;浏览和检索如此庞大的数据一直是一项不方便的工作。我们提出了一种红外图像视频摘要的方法，通过去除这些固有的时空冗余，考虑视频被映射到一个时间紧凑和时间相似(按时间顺序类似)的方式。以一组视频序列的帧组成一个X、Y、T轴的三维数据长方体，将该长方体重新表示为相邻的X、T切片的堆叠。利用Canny边缘检测和基于Hough变换的线段检测，对切片的内容进行分析，消除了具有时空冗余的线段。因此，录制的视频会根据其内容进行动态总结。

Keywords Video synopsis Video summarization IR MWIR Canny’s edge detection Hough transform-based line detection Spatio-temporal redundancy

视频摘要 IR MWIR Canny的边缘检测 Hough基于变换的线检测时空冗余

Introduction

Popularity of thermal imaging systems in surveillance technology has drawn a lot of attention from vision community in the past few decades. Increasing population of such systems is generating vast amount of data in the form of recorded videos; with help of video summarization a compact but informative representation of video sequence may be provided. Since for surveillance purpose timing information of events is important, chronology of events is also maintained in compact representation. Generally, IR (infra-red) signatures of targets are more prominent than background and clutter; this contrast is commonly used as a clue for change detection. We have also decided contrast-based clue for detecting representative segments with motion but in place of processing video sequence in X−Y plane, we have chosen X−T plane. Spatio-temporal regularity is utilized for labelling representative segments with motion.

在过去的几十年里，热成像系统在监视技术中的普及引起了视觉界的广泛关注。这类系统的数量不断增加，以录象的形式产生了大量的数据;在视频摘要的帮助下，可以提供一种紧凑但信息丰富的视频序列表示。由于事件的时间信息对于监视非常重要，因此事件的时间也以紧凑的表示形式保持。一般情况下，目标的红外信号比背景和杂波信号更明显;这种对比通常被用作变化检测的线索。我们还确定了基于对比的线索来检测具有运动的代表性片段，但我们选择了X T平面来代替X Y平面上的视频序列处理。利用时空规律性对具有代表性的运动片段进行标记。

2 Related Work

The goal of this section is to review and classify the state-of-the-art video synopsis generation methods and identify new trends. Our aim is to extract information from unscripted and unstructured data obtained from recorder of surveillance system. Ding [2] categorized video synopsis techniques in the following three levels:

● Feature-Based Extraction: In such approaches low level features like number of foreground pixels and distance between histograms are used to identify frames with higher information content.

● Object-Based Extraction: In such approaches objects of interest like vehicle, pedestrian are used for labelling frames with higher information content.

● Event-Based Extraction: In such approaches events like entrance of a vehicle, pedestrian in ﬁeld of view are used for setting pointers with high semantic level. Such approaches are more application specific.

本节的目的是回顾和分类的最先进的视频概要生成方法，并确定新的趋势。我们的目标是从监视系统的记录器中提取非脚本和非结构化的数据。Ding[2]将视频摘要技术分为以下三个层次:

1、基于特征的提取:在这种方法中，使用低层次的特征，如前景像素的数量和直方图之间的距离，来识别具有较高信息含量的帧。

2、基于对象的提取:在这种方法中，车辆、行人等感兴趣的对象被用来标记信息含量较高的帧。

3、基于事件的提取:在车辆入口等方法中，视场中的行人被用来设置具有高语义水平的指针。这样的方法更适合于应用程序。

Li et al. [3] presented an optical ﬂow based approach for surveillance video summarization. It is a motion analysis-based video skimming scheme in which play-back speed depends upon motion behaviour.

Li等人提出了一种基于光流的监视视频摘要方法。这是一个基于运动分析的视频浏览方案，其中播放速度取决于运动行为。

Ji et al. [4] presented an approach based on motion detection and trajectory extraction. Video is segmented based on the moving objects detection and trajectories are extracted from each moving object. Then, only key frames along with the trajectories are selected to represent the video summarization.

Ji等人提出了一种基于运动检测和轨迹提取的方法。基于运动目标检测对视频进行分割，提取每个运动目标的运动轨迹。然后，只选择沿着轨迹的关键帧来表示视频摘要。

Cullen et al. [5] presented an approach to detect boats, cars and people at coastal area. For this, the region of interest is decided and validated. It is taken as input for video condensation algorithm to remove inactive time space.

Cullen等人提出了一种在沿海地区探测船只、汽车和人的方法。为此，确定并验证感兴趣的区域。以视频压缩算法的输入为输入，去除不活动的时间空间。

Rav-Acha et al. [6] presented a method for dynamic video synopsis in which several activities were compressed into a shorter time, where the density of activities were much higher. For better summarization of video event, chronology is not maintained as several events are merged in few frames.

rava - acha等人提出了一种动态视频概要的方法，该方法将多个活动压缩到较短的时间内，其中活动的密度要高得多。为了更好的总结视频事件，年表没有维护，因为几个事件合并在几个帧。

Petrovic et al. [7] presented an approach for adaptive video fast forward. A likelihood function based upon content of video is formulated and playback speed is modelled accordingly.

Petrovic等人提出了一种自适应视频快进方法。提出了一种基于视频内容的似然函数，并建立了相应的回放速度模型。

Hoferlin et al. [8] presented an information based adaptive fast forward approach in which the playback speed depends on the density of temporal information in the video. The temporal information between two frames is computed by the divergence between the absolute frame diﬀerence and noise distribution.

Hoferlin等人提出了一种基于信息的自适应快进方法，其中回放速度取决于视频中时间信息的密度。两帧之间的时间信息是由绝对帧差和噪声分布之间的散度计算得到的。

Porikli [9] presented multiple camera surveillance and tracking system based on object based summarization approach. For this, only the video sequence for each object is stored in place of storing video for each camera. Then, object is tracked by background subtraction and mean shift analysis.

Porikli[9]提出了基于目标摘要的多摄像头监控跟踪系统。为此，只存储每个对象的视频序列，而不是存储每个相机的视频。然后通过背景减法和均值漂移分析跟踪目标。

Most of the approaches discussed above rely on motion detection-based techniques in X−Y plane for video summarization but in case of IR sequences with poor SNR and targets limited in very small fraction of X−Y plane it becomes challenging to detect targets, to tackle with such scenarios a novel approach of video summarization is presented in subsequent sections. In place of detecting targets in X − Y plane, trajectory of motion is extracted from X−T slices which covers a relatively larger fraction of X−T slice.

上面讨论的大多数方法依赖于运动检测技术在xy平面上的视频摘要。但是，当红外序列的信噪比较低且目标只局限在很小的X Y平面上时，目标的检测就变得很困难，为了解决这种情况，我们在后面的章节中提出了一种新的视频摘要方法。运动轨迹提取自X T切片，而不是检测X Y平面上的目标，其中X T切片所占比例相对较大。

3 Methodology

3.1 Overview of the Approach 方法概述

Problem of video synopsis can be defined as a mapping generation problem between a video sequence and its temporal compact version. In the present approach for mapping generation, considered video sequence is analysed in X−T plane and spatiotemporal redundant segments are eliminated. Trajectory of moving objects is utilized for this job.

视频摘要问题可以定义为视频序列与其时间压缩版本之间的映射生成问题。在现有的映射生成方法中，考虑视频序列在X-T平面上的特征，消除了时空冗余段。这项工作利用了运动物体的轨迹。

First a video sequence is represented as a 3D data cuboid VXYT then this cuboid is chopped in contiguous X −T slices and a set of slices I(y)XT is generated. Set of binary edge maps EXT (y) is obtained from I(y)XT with help of Canny’s edge detection. A consolidated edge map 𝜉XT is generated by registration of all elements of EXT (y).

首先将一个视频序列表示为一个三维数据长方体VXYT，然后将这个长方体在连续的X-T切片中进行切割，生成一组I(y)XT切片。在Canny边缘检测的帮助下，从I(y)XT中获得一组二值边缘映射EXT (y)。最后由EXT (y)中的所有元素生成一个整合的(统一的，整理过的)边缘映射𝜉XT。

Using Hough transform-based line detection with a number of constraints representative segments with motion are labelled in 𝜉XT**’ and 𝜁XT is generated where 𝜉XT**’ is binary edge map obtained from 𝜉XT. For transformation from VXYT to 𝛹XYT a transformation matrix 𝜏T is needed which is extracted from 𝜏XT where 𝜏XT is formed by subtracting 𝜉XT**’ from 𝜁XT and 𝛹XYT is 3D data cuboid representation of temporal compact version of video sequence.

使用基于霍夫变换的直线检测和一些(具有代表性的运动片段且用𝜉XT**’ 表示的)约束，生成𝜁XT，其中𝜉XT**’是从𝜁XT 中获得的二进制边缘映射。

从Vxyt到𝛹XYT 的转换需要一个转换矩阵𝜏T ，𝜏T 是从𝜏XT 中提取的，𝜏XT 由𝜁XT减去𝜉*XT**’*形成的。

𝛹XYT 是三维数据长方体表示的时间压缩版本的视频序列。

3.2 Data Cuboid Representation of a Video Sequence

视频序列的长方体表示

在这里插入图片描述

Fig. 1 Data cuboid representation of a video sequence a Frames from video sequence Windmill, b Video sequence Windmill represented as a 3D data cuboid, c Data cuboid chopped in contiguous X − T slices and d Data cuboid chopped in contiguous Y − T slices

图1视频序列的数据长方体表示:a帧来自视频序列风车，b帧来自视频序列风车，表示为一个三维数据长方体，c帧来自连续的X-T片中切碎的数据长方体，d帧来自连续的Y-T片中切碎的数据长方体。

As in Fig. 1. a,b group of frames of video sequence is taken to form a 3D data cuboid VXYT with X, Y and T as axes. VXYT can be expressed [10, 11] as following:

如图1所示。以X、Y、T为轴，取视频序列的a、b组帧，形成三维数据长方体VXYT。VXYT可以表达[10,11]如下:

VXYT = {I(t)XY , ∀t ∈ {1, 2, 3, ……, p}} (1)

Where I(t)XY is a frame of video sequence with X and Y axes at any particular time t and p is number of such frames of size mXn.

式中，I(t)XY为任意时刻具有X轴和Y轴的视频序列的一帧，p为大小为m X n的这类帧的个数。

As shown in Fig. 1 c data cuboid VXYT has an alternative representation [10, 11] as an stack of m number of contiguous X − T slices I(y)XT with size nXp.

如图1所示，c数据长方体VXYT有另一种表示[10,11]，它是大小为n X p的连续X-T片I(y)XT的m个数字的堆栈。

VXYT = {I(y)XT , ∀y ∈ {1, 2, 3, …… , m}} (2) Eq·2

Yet another way to represent [10, 11] the same data cuboid VXYT is suggested in Fig. 1 d by stacking n number of contiguous Y−T slices I(x)YT.

VXYT = {I(x)YT , ∀x ∈ {1, 2, 3, …… , n}} (3)

图1还提出了另一种表示[10,11]相同长方体VXYT的方法，即将n个相邻的YT切片I(x)YT叠加。

在这里插入图片描述

Fig. 2 A typical X − T slice representing features of stationary as well as moving objects in cor- responding X − Y plane of video sequence, moving objects are appearing in curved trajectory.

图2是一个典型的X - T切片，它表示视频序列的响应X - Y平面中的静止和运动物体的特征，运动物体以曲线轨迹出现。

3.3 Mapping Between Contents of X − Y and X − T Planes of Data Cuboid

数据长方体的X - Y和X - T平面内容之间的映射

If content of X − Y frame is stationary then in X−T slices there will be a number of horizontal features parallel to T axes. Since present approach assumes that video is recorded from a stationary IR system, such horizontal features are most likely content of X − T slices. Most important conclusion related to present work is that if there are pixels with local motion in X − Y frame then trajectory of motion appears in features of X − T slices having those pixels. Geometry of this trajectory can be approximated by combining a number of inclined line segments. If there is any acceleration in motion, then there will be a number of curves in trajectory but any curve can be approximated by combining a number of small inclined line segments. This fact is utilized for labelling of segments with motion. In Fig. 2, an X − T slice is shown which corresponds to X−Y frame containing stationary as well as moving objects, hence combination of corresponding features is appearing in figure.

如果X-Y坐标系的内容是固定的，那么在X-T切片中会有许多平行于T轴的水平特征。由于目前的方法假设视频是从一个固定的红外系统记录的，这种水平特征很可能是X-T片的内容。与目前工作相关的最重要的结论是，如果在X-Y帧中存在具有局部运动的像素，那么运动轨迹就会出现在具有这些像素的X-T切片的特征中。这个轨迹的几何形状可以通过结合一些倾斜的线段来近似。如果在运动中有任何加速度，那么在轨迹中就会有许多曲线，但任何曲线都可以通过结合许多小的斜线段来近似。这一事实被用于运动片段的标记。在图2中，X-T切片对应于包含静止和运动物体的X-Y坐标系，因此图中出现了相应特征的组合。

Formation of a Set of Binary Edge Maps. From Eq.2 a set {I(y)XT, ∀y ∈

{1, 2, 3, …, m}} is obtained from VXYT. In this section we obtain EXT (y) from {I(y)XT , ∀y ∈ {1, 2, 3, …, m}} using Canny’s edge detection [12], which is one of the most widely used edge detection algorithms. Even though it is quite old, it has become one of the standard edge detection methods and it is still used in research [13]. Canny redefined edge detection problem as a signal processing optimization problem and defined an objective function with following：

– Detection: Probability of detecting real edge points should be maximized while the probability of falsely detecting non-edge points should be minimized.

– Localization: Amount of error between detected edge and real edge should be minimum.

– Number of responses: For one real edge there should be one detected edge though this point is implicit in first point yet important.

生成（构造）一组二进制边缘映射。在公式2(红色公式)中一组{I(y)XT, ∀y ∈{1, 2, 3, …, m}}是由VXYT得到。在本节中，我们使用Canny’s边缘检测[12]（这是应用最广泛的边缘检测算法之一），从{I(y)XT , ∀y ∈ {1, 2, 3, …, m}}中获得EXT (y)。

虽然它已经很老了，但是它已经成为了标准的边缘检测方法之一，并且仍然在研究[13]中使用。Canny将边缘检测问题重新定义为一个信号处理优化问题，定义了一个目标函数如下:

检测:最大限度地提高检测到真实边缘点的概率，尽量减小非边缘点被误检的概率。

定位:检测到的边缘与真实边缘之间的误差应尽可能小。

响应数:对于一条实边，应该有一条检测到的边，尽管这一点在第一点中是隐式的，但仍然很重要。

Consolidated Edge Map Generation. Since we are mapping video sequence into a temporal compact representation, information carried along Y axes of V**XYT is redundant at least for labelling of representative segments with motion; therefore for further processing we are using a consolidated edge map formed by utilizing all elements of E**XT (y). As E**XT (y) is generated from I(y)XT whose elements are contiguous slices, all elements of E**XT (y) are already registered in spatial domain; hence consolidated edge map 𝜉XT of V**XYT is generated by using logical OR operation over all elements of E**XT (y).

合并边缘映射生成。由于我们将视频序列映射到一个时间压缩表示中，沿着VXYT 中的Y轴携带的信息至少对于标记具有运动的代表性片段是多余的；因此，为了进行进一步的处理，我们使用由EXT(y)中所有元素组成的合并边缘映射。由于EXT(y)是由元素为连续片的I(y)XT生成的，所以EXT(y)中的所有元素都已经在空间域中注册(登记、标记、记名了); 因此合并边缘映射(VXYT中的)𝜉XT是由对EXT(y)中的所有元素使用逻辑或操作生成的。

3.4 Extraction of Representative Segments with Motion from Consolidated Binary Edge Map

从合并的二进制边缘图中提取具有运动的代表性片段

As discussed earlier, to extract segments having motion we have to extract inclined line segments from X − T slices, hence our goal is to ﬁnd out set of inclined 𝛶XT . But as 𝛶XT ⊂ L**XT where L**XT is set of lines with cardinality r in any X − T slice, elements of 𝛶XT are obtained from L**XT with imposed constraints. Hough transform-based line detection is used to ﬁnd out elements of L**XT .

正如前面所讨论的,为了提取具有运动的部分，就要从X-T片中提取倾斜线段,因此我们的目标是找出一组倾斜的𝛶XT。但是，当𝛶XT ⊂ L**XT （LXT是任意X - T片上具有基数r的行集合）时，𝛶XT中的元素是通过施加约束从LXT中获得的。

Hough Transform-Based Line Detection. Now we have to explore a set of line segments L**XT (y) from binary edge map 𝜉XT of consolidated edge map 𝜉XT which is mathematically a set of points in any V**XYT . Hough transform [14] based line detection is a very popular, accurate, easy, and voting-based approach for such kind of operations [15]. Hough transform is based upon line point duality between X − Y and M − C domains, where y = mx + c is equation of line. By quantizing the M − C space appropriately a two-dimensional matrix H is initialized with zeros. A voting- based method is used for ﬁnding out elements of H matrix H(m**i, c**i), showing the frequency of edge points corresponding to certain (m, c) values.

基于霍夫变换的线路检测。现在我们必须探索一组线段L**XT (y) (这组线段来自合并边缘图𝜉XT中的二进制边缘图𝜉XT) ，𝜉XT就是在任何VXYT数学上的点的集合。基于Hough变换[14]的行检测是一种非常流行的、精确的、简单的、基于投票的此类操作方法[15]。霍夫变换是基于X−Y和M−C域之间的线点对偶性，其中Y = mx + C是直线方程。通过对M - C空间进行适当的量化，一个二维矩阵H被初始化为零。采用基于投票的方法求出H矩阵H(mi, ci)的元素，显示出特定(m, c)值对应的边缘点的频率。

Considered Constraints. Following are assumed constraints while implementing Hough transform-based line detection:

考虑约束条件。下面是实现基于霍夫变换的线检测时的假设约束

– Slope constraint: If L**XT = {l**iXT (y), ∀i ∈ {1, 2, 3…r}} where l**iXT , ∀i ∈ {1, 2, 3…r}

are line segments with slopes {𝜃iX**T , ∀i ∈ {1, 2, 3…r}} in any IXT (y), ∀y ∈

{1, 2, 3…m}} then l**iXT ∈ 𝛶XT if 𝜃low < 𝜃iXT < 𝜃high, ∀i ∈ {1, 2, 3…r}

Where 𝜃high is dependent upon global motion in I(t)XY ∀t ∈ {1, 2, 3…, p} and 𝜃low

is dependent upon velocity of moving object and clutter in scene.

斜率约束：公式太多自己看原文吧，不翻译了。。。

– Maximum Length constraint: In present approach we are using Hough transform based line detection for labelling representative segments having motion, so few constraints have been imposed on this method. It will increase temporal redundancy if an object is with motion with similar pose is part of for more than few frames. Since 𝜉XT is generating transformation matrix between VXYT and 𝛹XYT , inclined line segments of a fixed slope with more than a threshold length are replaced with inclined line segments of a fixed slope with threshold length. By setting an upper threshold on H matrix of Hough transform line segments more than certain length can be avoided.

最大长度约束:在目前的方法中，我们使用基于霍夫变换的线检测标记有运动的代表性片段，所以很少对这种方法施加约束。如果一个具有相似姿态的运动对象在多于几帧的情况下是运动的一部分，则会增加时间冗余。由于𝜉XT在VXYT和𝛹XYT之间生成变换矩阵, 并将大于阈值长度的固定斜率的倾斜线段替换为具有阈值长度的固定斜率的倾斜线段。通过在Hough变换线段的H矩阵上设置一个上阈值，可以避免超过一定长度的情况。

– Minimum Length constraint: As it is obvious in real time scenarios that there will be a substantial amount of clutter available in captured scenes in form of unwanted motions due to various causes, e.g., motion in leaves due to wind, it becomes necessary to tackle such scenarios for robustness of proposed approach.

By analysing such unwanted motions we can conclude that such motions will also generate incline trajectories in X − T slices but shorter in length, hence by selecting a lower length threshold in H matrix of Hough transform these can be eliminated.

最小长度约束: 由于在实时场景中，由于各种原因(如风引起的叶子中的运动)，捕捉到的场景中会出现大量的杂波，因此有必要对这些场景进行处理，以增强所提方法的鲁棒性。通过分析这些不需要的运动，我们可以得出这样的结论:这些运动也会在X - T片上产生倾斜轨迹，但长度更短，因此，通过在Hough变换的H矩阵中选择一个较低的长度阈值，这些运动可以被消除。

3.5 Labelling of Representative Segments with Motion

有代表性的节段标记运动

From Eq. 4 set of representative segments with motion 𝜏XT is diﬀerence of 𝜁XT as in Fig. 3b and 𝜉XT′as in Fig. 3a.

𝜏XT = 𝜁XT − 𝜉XT′ (4)

从下边公式4中一组具有代表性的运动片段𝜏XT ,𝜏XT 和𝜁XT的差异在图3b和𝜉XT′的差异在图3a。

3.6 Extraction of Representative Segments with Motion

具有代表性的运动片段的提取

A sparse set 𝜏T is generated from 𝜏XT with unity entries corresponding to frame numbers with representative motion segments. This set is used as transformation matrix for obtaining 𝛹XYT from VXYT.

一组稀疏的𝜏T是使用单位条目对应具有代表性运动片段的帧号从𝜏XT中生成出来的。这个集合是用作从VXYT获得𝛹XYT变换矩阵。

4 Results

Results of video synopsis along with video sequences are presented based on our approach on two datasets. As there are very limited datasets available for such sequences, we have tried to generate a robust test bed of thermal imaging sequences captured in different environmental conditions using a 640 × 512 detecting elements-based MWIR imaging system. Number of frames in temporal compact representation are dependent upon motion content of considered video sequence. There are also few false alarms in form of frames containing no motion information, due to outlier line segments during Hough transform-based line detection. As in Fig. 4a there is Room dataset containing an IR video sequence of 1393 frames out of which 939 frames contain object(s) with motion, in its temporal summarized representation there are 525 frames out of which 55 frames are false positives; this implies that we are getting almost 2.65 times compressed sequence. There are three randomly moving persons in this sequence, it can be concluded that almost all important events related with motion are captured in its compact representation. Analysis is also done for Road dataset containing a MWIR video sequence Road-2 of 2440 frames out of which 1425 frames contain object(s) with motion, it is transformed in a compact representation containing 1084 frames with almost 2.25 times compression out of which 243 frames are false positives. Similarly as in Fig. 4b for Road-1 sequence containing a thermal video of 823 frames we are getting almost two times temporal compact representation with 411 frames.

在两个数据集上给出了视频摘要和视频序列的结果。由于此类序列的可用数据集非常有限，因此我们尝试使用基于640×512检测元素的MWIR成像系统，生成在不同环境条件下捕获的热成像序列的鲁棒试验台。时域压缩表示中的帧数依赖于考虑的视频序列的运动内容。在基于Hough变换的线段检测中，由于异常线段的存在，使得不包含运动信息的帧形式的假警报也较少。如图4a所示，房间数据集包含1393帧的IR视频序列，其中939帧包含运动的对象，在其时间汇总表示中有525帧，其中55帧为假阳性；这意味着我们得到了几乎2.65倍的压缩序列。在这个序列中有三个随机移动的人，可以得出结论，几乎所有与运动相关的重要事件都被捕捉到它的紧凑表示中。对包含2440帧的MWIR视频序列Road-2的路数据集进行了分析，其中1425帧包含有运动的对象，它被转换为包含1084帧的紧凑表示，其中约2.25倍的压缩，其中243帧是假阳性。与图4b类似，对于包含823帧热视频的Road-1序列，我们得到的是包含411帧的几乎两倍时间压缩表示。

# 图3

Fig. 3 For Room video sequence of 1393 frames X axes representing T (number of frames) and Y axes representing X a 𝜉XT Binary edge map obtained from Canny’s edge detection of consolidated binary edge map 𝜉XT , b 𝜁XT Result of Hough-based line with imposed constraints (in red), c 𝜏XT representative segments with motion (in red) and d Selected frame nos. for compact representation

(in white)

# 图4

Fig. 4 For both figures a and b sequence shown above is considered video and sequence shown below is temporal compact representation of considered video a Synopsis Generation for Room dataset: In its compact representation, there are four non-contiguous frames, first frame corresponds to entrance of person-1, second and third frames correspond to pose change and fourth frame corresponds to entrance of person-2 and b Synopsis Generation for Road-1 sequence: In its compact representation, there are four non-contiguous frames representing entrance of pedestrian, car, bus,and truck, respectively。

图4 a和b两个数字序列上面是视频和序列所示时间紧凑的表示被认为是空间数据集的视频简介一代:在它的紧凑表示,非相邻帧有四个,第一帧对应person-1入口,第二和第三帧对应构成变化和person-2第四帧对应的入口和b简介代路一段序列:在其紧凑的表示中，有四个非连续的帧分别表示行人、汽车、公共汽车和卡车的入口

5 Limitations

Although we have obtained very promising results from present approach there are certain limitations. As Hough transform is a voting-based mechanism for detecting geometries from a set of points and we are using it with some imposed constraints, hence it is obvious that there will be a number of outliers and missing segments.

虽然我们从目前的方法中获得了非常有希望的结果，但也有一定的局限性。由于霍夫变换是一种基于投票的机制，用于从一组点检测几何图形，我们在使用它时附加了一些约束条件，因此很明显会有一些异常值和缺失的段。

When transformation matrix is generated using these outliers then there are a few frames which unnecessarily become part of temporal compact representation 𝛹XYT and hence we are unable to completely eliminate spatio-temporal redundancy. On the other hand, if the missing segments are part of 𝜏XT then few of important events may be missing from 𝛹XYT. Number of such outlier or missing segments can be reduced by adjusting upper and lower thresholds of Canny’s edge detection.

当变换矩阵使用这些异常值生成时，还会有不必要成为时间的紧凑表示的一部分的几帧𝛹XYT,因此我们不能完全消除时空冗余。另一方面,如果缺失片段是𝜏XT的一部分，然后几个重要事件可能从𝛹XYT中失踪。通过调整Canny边缘检测的上下阈值，可以减少这种离群点或缺失段的数量。

6 Conclusion

We considered a novel approach for video synopsis in IR imagery. Although there are a number of approaches suggested in literature yet Hough transform based line detection has barely been used to solve such kind of problems. We are making use of Canny’s edge detection and Hough transform based line detection, fortunately both are very old and well established algorithms. This makes implementation aspect of present model very simple. The results are promising barring limitations and model is extremely simple.

我们考虑了一种新颖的红外图像视频摘要方法。虽然文献中提出了许多方法，但是基于霍夫变换的线检测方法很少用于解决这类问题。我们利用Canny的边缘检测和基于Hough变换的线检测，幸运的是，两者都是非常古老和建立良好的算法。这使得现有模型的实现方面非常简单。结果是有希望的限制和模型是非常简单的。

Acknowledgements We take this opportunity to express our sincere gratitude to Dr. S.S. Negi, OS and Sc ‘H’, Director, IRDE, Dehradun for his encouragement. As good things cannot proceed without good company, we would like to thank Mrs Meenakshi Massey, Sc ‘C’ for not only bearing with us and our problems but also for her support in generating datasets.

我们借此机会对OS和Sc H, IRDE, Dehradun主任S.S. Negi博士的鼓励表示衷心的感谢。由于好的事情没有好的伙伴就无法进行，我们要感谢Sc C的Meenakshi Massey女士，感谢她不仅支持我们和我们的问题，还支持我们生成数据集。

卡塞尔学院临时校长

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Video Synopsis for IR Imagery论文翻译

写在前面：2016年的论文，有关红外的视频摘要论文，感觉这论文有些句子写的不太好懂，就是感觉语言表达的不够清晰，当然也有可能是我英语水平有问题= =，这论文看看就行也不是顶会啥的，但主要做的是红外视频的，比较新奇，后边公式多的地方也有没翻译的，对照着原论文凑活看吧。原论文地址点这里给个 DOI：10.1007/978-981-10-2104-6_21 sci-hub自己下也行。Video Synopsis for IR Imagery Considering Video as a 3D Data
复制链接

扫一扫