【计算机科学】【2018.04】基于深度学习的异常检测-CSDN博客

在这里插入图片描述
本文为澳大利亚昆士兰理工大学（作者：Rohit Ramesh）的硕士论文，共93页。

目标检测与跟踪是监视系统领域的重要组成部分。室外监控系统主要利用可视摄像机实现各种应用，如行人检测、人脸识别和人体姿势估计。在城市街道、公共汽车站、地铁和机场内和周围安装多个摄像头的目的是监测一个人或一群人所遭受的各种威胁或危险，并立即采取行动以避免或应对这些威胁或危险。比如在拥挤的公共场所丢失了背包，可以很容易地根据视频寻找到感兴趣的不寻常事件，并可能提供对当地环境潜在威胁的视觉线索。

对于计算机视觉领域存在的异常，可以从两个不同的角度来观察异常事件检测器。其中一种可能是实时威胁，如丢包；另一种可能是不可预测的行为，如滑冰、骑自行车、跑步和追逐，与其他正常的人群行为（如走路）相比，这些行为也被视为不寻常的事件检测。这个硕士学位项目的目标是通过使用可靠的描述性图像特征，利用最先进的深度学习方法，将本文工作与现有的手工提取特征进行比较，研究异常事件检测。为此，选择了C3D（3D卷积神经网络）从基线模型中提取特征。整个过程包括训练模型、从C3D中提取特征、在MATLAB中进行各种预处理，达到基于帧的检测和基于块的检测。一个在全球范围内广泛应用的异常检测数据集UCSD被用来进行实验。

本硕士论文的主要贡献是利用最先进的C3D深度学习网络，开发基于帧和基于块的异常事件检测方法。首先，将UCSD数据集上的三个正交平面线性二值模式（LBPTOP）异常事件检测作为特征提取的基线系统。接着，将三维卷积神经网络（C3D）应用到同一基线模型中，并观察其特征的差异。实验结果显示了深度学习方法的强大潜力，它可以检测通过视频反馈观察到的人群中的异常情况。

Object detection and tracking contribute animportant part in the field of surveillance systems. Outdoor surveillancesystems mostly utilise visible cameras for various applications like pedestriandetection, face recognition and human pose estimation. The purpose ofinstalling several cameras in and around city streets, bus stations, metros,and airports is to monitor the kinds of threat or danger committed by a singleperson or a group of people and take immediate actions in order to avoid or torespond to them. One of the activities like dropping off a bag in a crowdedpublic place can easily signify unusual events of interest and may provide avisual clue of potential threats to the local environment. With regard to theexisting abnormalities in the domain of computer vision, unusual eventdetectors can be viewed through two different perspectives. One which couldpossibly be a real time threat like dropping off a bag and another isunpredictable behaviour like skating, cycling, running and chasing which arealso considered as unusual event detections in comparison to the other normalbehaviours of the crowd like walking. The goal of this Master degree project isto investigate abnormal event detections through the use of reliabledescriptive image features by utilising state-of-the-art deep learning methodsand comparing the work with the existing handcrafted features. To do so, theC3D (3D Convolutional Neural Network) has been chosen for extracting featuresfrom a baseline model. The whole process will consist of different stepsranging from training the model, extracting features from the C3D and variouspreprocessing work in MATLAB to reach to the frame-based detection andpatch-based detection. One of the notable datasets which is widely used across theglobe for abnormality detection, the UCSD dataset, is utilised for performingthe experiments. The key contribution in this Master thesis is to utilise astate-of-the-art deep learning network, the C3D network, to develop bothframe-based and patch-based detection results for abnormal event detections. Tobegin with, the unusual event detection in crowded scenes by the Linear BinaryPattern from Three Orthogonal Planes (LBPTOP) method on the UCSD dataset isconsidered as a baseline system for the feature extraction. Continuing further,the 3D convolutional neural network (C3D) into the same baseline model has beenimplemented and the differences in the features are observed. The experimentalresults demonstrate the strong potential of a deep learning approach to detectabnormalities in crowds observed through video feeds.