视频研究入门经典
intro: CVPR 2016
intro: Lead–Exceed Neural Network (LENN), LSTM
paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/CVPR16_webly_final.pdf
intro: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, "Movie fill-in-the-blank" Challenge, UCF_CRCV
intro: Video-Fill-in-the-Blank (ViFitB)
arxiv: https://arxiv.org/abs/1610.04062
intro: Google DeepMind
arxiv: https://arxiv.org/abs/1610.00527
arxiv: https://arxiv.org/abs/1610.05985
intro: CVPR 2017. Max Planck Institute for Intelligent Systems & Bernstein Center for Computational Neuroscience
project page: https://varunjampani.github.io/vpn/
arxiv: https://arxiv.org/abs/1612.05478
github(Caffe): https://github.com/varunjampani/video_prop_networks
project page: https://liuziwei7.github.io/projects/VoxelFlow.html
arxiv: https://arxiv.org/abs/1702.02463
intro: Stanford InfoLab
keywords: NoScope. difference detectors, specialized models
arxiv: https://arxiv.org/abs/1703.02529
github: https://github.com/stanford-futuredata/noscope
github: https://github.com/stanford-futuredata/tensorflow-noscope
http://dawn.cs.stanford.edu/2017/06/22/noscope/
intro: CVPR 2017. Stanford University & University of Southern California
arxiv: https://arxiv.org/abs/1703.02521
https://arxiv.org/abs/1703.09788
intro: Baidu Research
intro: "The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization"
arxiv: https://arxiv.org/abs/1705.08918
intro: DeepMind
intro: "Audio-Visual Correspondence" learning
arxiv: https://arxiv.org/abs/1705.08168
intro: Peking University
arxiv: https://arxiv.org/abs/1706.04124
github: https://github.com/gitpub327/VideoImagination
intro: CVPR 2017. Stanford University & CMU & Simon Fraser University
arxiv: https://arxiv.org/abs/1706.02884
intro: Accepted on the second International Workshop on Egocentric Perception, Interaction and Computing(EPIC) at International Conference on Computer Vision(ICCV-17)
arxiv: https://arxiv.org/abs/1709.06495
intro: AAAI 2018
project page: http://research.nvidia.com/publication/2018-02_Learning-Binary-Residual
arxiv: https://arxiv.org/abs/1712.05087
intro: CVPR 2018
arxiv: https://arxiv.org/abs/1803.10628
intro: CVPR 2018
arxiv: https://arxiv.org/abs/1804.07667
intro: two-stream ConvNet
arxiv: https://arxiv.org/abs/1804.10021
intro: CVPR 2018
arxiv: https://arxiv.org/abs/1805.02792
https://arxiv.org/abs/1806.04620
intro: CMU
arxiv: https://arxiv.org/abs/1805.07339
intro: DeepMind & University of Oxford
arxiv: https://arxiv.org/abs/1806.03863
intro: LIRIS & Facebook AI Research
arxiv: https://arxiv.org/abs/1806.06157
intro: BMVC 2018
arxiv: https://arxiv.org/abs/1807.06980
视频行为识别检测综述 IDT TSN CNN-LSTM C3D CDC R-C3D
视频分类
intro: CVPR 2014
project page: http://cs.stanford.edu/people/karpathy/deepvideo/
paper: www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf
intro: Video-level event detection. extracting deep features for each frame, averaging frame-level deep features
arxiv: http://arxiv.org/abs/1503.04144
intro: CNN + LSTM
arxiv: http://arxiv.org/abs/1503.08909
demo: http://pan.baidu.com/s/1eQ9zLZk
intro: ACM Multimedia, 2015
arxiv: http://arxiv.org/abs/1504.01561
Video Content Recognition with Deep Learning
author: Zuxuan Wu, Fudan University
slides: http://vision.ouc.edu.cn/valse/slides/20160420/Zuxuan%20Wu%20-%20Video%20Content%20Recognition%20with%20Deep%20Learning-Zuxuan%20Wu.pdf
author: Yu-Gang Jiang, Lab for Big Video Data Analytics (BigVid), Fudan University
slides: http://www.yugangjiang.info/slides/DeepVideoTalk-2015.pdf
intro: Google
arxiv: http://arxiv.org/abs/1505.06250
arxiv: http://arxiv.org/abs/1509.06086
paper: http://jmlr.org/proceedings/papers/v48/fernando16.html
paper: http://jmlr.csail.mit.edu/proceedings/papers/v48/fernando16.pdf
summary(by Hugo Larochelle): http://www.shortscience.org/paper?bibtexKey=conf/icml/FernandoG16#hlarochelle
arxiv: http://arxiv.org/abs/1609.06782
arxiv: https://arxiv.org/abs/1611.06453
intro: CVPR 2017
intro: It provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos)
arxiv: https://arxiv.org/abs/1611.07715
github(official, MXNet): https://github.com/msracver/Deep-Feature-Flow
youtube: https://www.youtube.com/watch?v=J0rMHE6ehGw
https://arxiv.org/abs/1706.04488
intro: Solution to the Kaggle's competition Google Cloud & YouTube-8M Video Understanding Challenge
arxiv: https://arxiv.org/abs/1706.04572
github: https://github.com/mpekalski/Y8M
intro: CVPR17 Youtube 8M workshop. Kaggle 1st place
arxiv: https://arxiv.org/abs/1706.06905
github: https://github.com/antoine77340/LOUPE
intro: Youtube-8M Challenge, 4th place
arxiv: https://arxiv.org/abs/1707.00803
https://arxiv.org/abs/1707.01786
intro: Classification Challenge Track paper in CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
arxiv: https://arxiv.org/abs/1707.03296
intro: CVPR2017 Workshop on Youtube-8M Large-scale Video Understanding
arxiv: https://arxiv.org/abs/1707.04045
intro: CVPR'17 Workshop on YouTube-8M
arxiv: https://arxiv.org/abs/1707.04143
github: https://github.com/ffmpbgrnn/yt8m
https://arxiv.org/abs/1707.02069
intro: CVPR 2017 Youtube-8M Workshop
arxiv: https://arxiv.org/abs/1707.04272
intro: ACM Multimedia, 2017
arxiv: https://arxiv.org/abs/1708.00973
intro: CVPR 2018. CMU & Facebook AI Research
arxiv: https://arxiv.org/abs/1711.07971
github(Caffe2): https://github.com/facebookresearch/video-nonlocal-net
https://arxiv.org/abs/1711.08200
arxiv: https://arxiv.org/abs/1711.09125
github: https://github.com/wanglimin/ARTNet
intro: ECCV 2018. Google Research & University of California San Diego
arxiv: https://arxiv.org/abs/1712.04851
https://arxiv.org/abs/1807.00983
Deep Architectures and Ensembles for Semantic Video Classification
https://arxiv.org/abs/1807.01026
intro: ECCV 2018
arxiv: https://arxiv.org/abs/1807.08259
intro: BMVC 2018
arxuv: https://arxiv.org/abs/1808.03232
intro: ECCV 2018 workshop on YouTube-8M Large-Scale Video Understanding
intro: Tencent AI Lab & Fudan University
arxiv: https://arxiv.org/abs/1810.00207
Learnable Pooling Methods for Video Classification
intro: Youtube 8M ECCV18 Workshop
arxiv: https://arxiv.org/abs/1810.00530
github: https://github.com/pomonam/LearnablePoolingMethods
-
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification
intro: ECCV 2018 workshop
arxiv: https://arxiv.org/abs/1811.05014
github: https://github.com/linrongc/youtube-8m
视频行为识别 / 行为检测
paper: http://www.cs.odu.edu/~sji/papers/pdf/Ji_ICML10.pdf
paper: http://liris.cnrs.fr/Documents/Liris-5228.pdf
arxiv: http://arxiv.org/abs/1406.2199
intro: "built action models from shape and motion cues. They start from the image proposals and select the motion salient subset of them and extract saptio-temporal features to represent the video using the CNNs."
arxiv: http://arxiv.org/abs/1411.6031
paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Du_Hierarchical_Recurrent_Neural_2015_CVPR_paper.pdf
intro: CVPR 2015. TDD
paper: www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf
ext: http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_105_ext.pdf
poster: https://wanglimin.github.io/papers/WangQT_CVPR15_Poster.pdf
github: https://github.com/wanglimin/TDD
paper: http://cvgl.stanford.edu/papers/tian2015.pdf
Contextual Action Recognition with R*CNN
arxiv: http://arxiv.org/abs/1505.01197
github: https://github.com/gkioxari/RstarCNN
arxiv: http://arxiv.org/abs/1507.02159
github: https://github.com/yjxiong/caffe
intro: LSTM / RNN
arxiv: http://arxiv.org/abs/1511.04119
project page: http://shikharsharma.com/projects/action-recognition-attention/
github(Python/Theano): https://github.com/kracwarlock/action-recognition-visual-attention
intro: CVPR 2016
project page: http://ai.stanford.edu/~syyeung/frameglimpses.html
arxiv: http://arxiv.org/abs/1511.06984
paper: http://vision.stanford.edu/pdf/yeung2016cvpr.pdf
arxiv: http://arxiv.org/abs/1603.06829
Active Learning for Online Recognition of Human Activities from Streaming Videos
arxiv: http://arxiv.org/abs/1604.02855
arxiv: http://arxiv.org/abs/1604.06573
github: https://github.com/feichtenhofer/twostreamfusion
arxiv: http://arxiv.org/abs/1604.08880
arxiv: http://arxiv.org/abs/1605.03324
paper: http://web.mit.edu/vondrick/prediction.pdf
VideoLSTM Convolves, Attends and Flows for Action Recognition
arxiv: http://arxiv.org/abs/1607.01794
arxiv: http://arxiv.org/abs/1607.06416
arxiv: http://arxiv.org/abs/1607.07043
arxiv: http://arxiv.org/abs/1607.08584
intro: won the 1st place in the untrimmed video classification task of ActivityNet Challenge 2016. TSN
arxiv: http://arxiv.org/abs/1608.00797
github: https://github.com/yjxiong/anet2016-cuhk
intro: CVPR 2016. H-FCN
project page: http://wanglimin.github.io/actionness_hfcn/index.html
paper: http://wanglimin.github.io/papers/WangQTV_CVPR16.pdf
github: https://github.com/wanglimin/actionness-estimation/
intro: CVPR 2016
project page: http://zbwglory.github.io/MV-CNN/index.html
paper: http://wanglimin.github.io/papers/ZhangWWQW_CVPR16.pdf
github: https://github.com/zbwglory/MV-release
intro: ECCV 2016. HMDB51: 69.4%, UCF101: 94.2%
arxiv: http://arxiv.org/abs/1608.00859
paper: http://wanglimin.github.io/papers/WangXWQLTV_ECCV16.pdf
github: https://github.com/yjxiong/temporal-segment-networks
intro: An extension of submission http://arxiv.org/abs/1608.00859
arxiv: https://arxiv.org/abs/1705.02953
arxiv: http://arxiv.org/abs/1607.06416
intro: CVPR 2016
arxiv: http://arxiv.org/abs/1608.03217
arxiv: http://arxiv.org/abs/1608.04339
intro: CVPR 2016
arxiv: http://users.cecs.anu.edu.au/~sgould/papers/cvpr16-dynamic_images.pdf
github: https://github.com/hbilen/dynamic-image-nets
arxiv: http://arxiv.org/abs/1608.07876
arxiv: http://arxiv.org/abs/1608.08242
ECCV 2016 workshop: http://bravenewmotion.github.io/
intro: Bachelor Thesis Report at ETSETB TelecomBCN
project page: https://imatge-upc.github.io/activitynet-2016-cvprw/
arxiv: http://arxiv.org/abs/1608.08128
github: https://github.com/imatge-upc/activitynet-2016-cvprw
arxiv: http://arxiv.org/abs/1609.03056
arxiv: https://arxiv.org/abs/1610.03898
intro: NIPS 2016
arxiv: https://arxiv.org/abs/1611.02155
arxiv: https://arxiv.org/abs/1611.02447
arxiv: https://arxiv.org/abs/1611.03607
arxiv: https://arxiv.org/abs/1611.05215
arxiv: https://arxiv.org/abs/1611.05267
-
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
arxiv: https://arxiv.org/abs/1611.08240
arxiv: https://arxiv.org/abs/1612.03052
intro: Australian Center for Robotic Vision & Data61/CSIRO
arxiv: https://arxiv.org/abs/1701.05432
https://arxiv.org/abs/1703.10664
project page: http://yjxiong.me/others/ssn/
arxiv: https://arxiv.org/abs/1704.06228
github: https://github.com/yjxiong/action-detection
https://arxiv.org/abs/1706.08807
https://arxiv.org/abs/1708.07590
intro: International Conference of Computer Vision Workshop (ICCVW), 2017
arxiv: https://arxiv.org/abs/1708.09268
https://arxiv.org/abs/1708.09522
https://arxiv.org/abs/1710.03383
keywords: Deep networks with Temporal Pyramid Pooling (DTPP)
arxiv: https://arxiv.org/abs/1711.04161
intro: WACV 2018
arxiv: https://arxiv.org/abs/1801.03983
https://arxiv.org/abs/1801.07230
-
A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition
https://arxiv.org/abs/1802.00421
https://arxiv.org/abs/1802.08362
intro: CVPR 2018. Facebook Research
intro: R(2+1)D and Mixed-Convolutions for Action Recognition.
project page: https://dutran.github.io/R2Plus1D/
arxiv: https://arxiv.org/abs/1711.11248
github: https://github.com/facebookresearch/R2Plus1D
https://arxiv.org/abs/1805.08162
https://arxiv.org/abs/1810.04511
Projects
intro: CS231n student project report
paper: http://cs231n.stanford.edu/reports2016/221_Report.pdf
github: https://github.com/garythung/torch-lrcn
github: https://github.com/jrbtaylor/ActivityNet
github: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition/
github(MXNet): https://github.com/Ldpe2G/DeepLearningForFun/tree/master/Mxnet-Scala/HumanActivityRecognition
intro: Locate and recognize faces in a video, Detect shots in a film, Search videos by image
github: https://github.com/scanner-research/scanner
intro: Activity Recognition Algorithms for the Charades Dataset
github: https://github.com/gsig/charades-algorithms
intro: MXNet implementation of Non-Local and Squeeze-Excitation network
github: https://github.com/WillSuen/NonLocalandSEnet
事件识别
arxiv: http://arxiv.org/abs/1510.02899
arxiv: https://arxiv.org/abs/1701.00599
github: https://github.com/znaoya/aenet
事件检测
paper: http://120.52.72.47/winsty.net/c3pr90ntcsf0/papers/devnet.pdf
paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Gan_DevNet_A_Deep_2015_CVPR_paper.pdf
intro: CVPR 2016
arxiv: http://arxiv.org/abs/1511.02917
paper: www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Ramanathan_Detecting_Events_and_CVPR_2016_paper.pdf
paper: http://vision.stanford.edu/pdf/johnson2016cvpr.pdf
blog: http://www.leiphone.com/news/201606/l1TKIRFLO3DUFNNu.html
intro: INTERSPEECH 2016
arxiv: https://arxiv.org/abs/1604.07160
arxiv: https://arxiv.org/abs/1612.07403
intro: Joint Event Detection and Description Network (JEDDi-Net)
arxiv: https://arxiv.org/abs/1802.10250