写在前面
2021.4.21更新
又被老师换题目了,现在做大规模检测。无语。
所以视频检测就更到这里了,不过说点题外话。视频检测现在的做法都是结合前后帧,都没有一个在线的,纯粹为了刷精度,工业需要的是实时的在线的检测。过去的这些研究fps才不到20,也只能止步于学术了。
实际的做法仍然是把目标检测的算法拿过来直接用,最常见的就是YOLO,做目标检测的估计没有一个不知道yolo的,因为yolo是唯一一个真*实时检测。
像什么rcnn,ssd,都做不到真正的实时。而工业最需要的就是效率,没有速度精度再高也没用,三四十秒处理一张图片?
最后,欢迎大家给我的github项目标星:
darknet
实现了repulsion-loss.
比最早的yolov4提高了差不多%2的map。
完全没用,具体内容详解可以看我的另一篇博客:
Repulsion Loss:为解决密集人群检测中遮挡设计的损失函数
因为和实验室另一个同学研究方向重了,所以只能换方向。。恰巧看到视频目标检测这个题目,就换了,结果。。不出意料的入坑了。。泪
不过目前目标检测做的真的已经很好,咱也不可能像那些大牛们想出anchor-free的方法,one-stage,two-stage,然后现在分为anchor-base与anchor-free。看这名字,多优美。就感觉非常系统。
视频目标检测是从2016年才有论文的,因为2015 ImageNet才有第一届VID比赛 。顶会都是你得先有比赛成果才可能接收你的论文。可惜大部分都不开源。。
以下论文的链接都在这里,百度链接一直被挂
链接: https://pan.baidu.com/s/15fnD5FpmKd-EBm7kvrDniw
提取码: i9hf
传统的不讲了。从深度学习开始。翻译均为我自己一篇一篇的逐字逐句翻译。
一些参考资料
总结的挺详细,博客园的文章
https://www.cnblogs.com/mcgl/p/13993171.html
2016(15年才有的VID比赛,16年论文才收录发表,大部分都按时间顺序先后排)
- Seq-NMS: Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran,Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang.
“Seq-NMS for Video Object Detection”.ArXiv(2016).(无代码) - Object detection from video tubelets with convolutional neural networks: Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang.
“Object detection from video tubelets with convolutional neural networks”. CVPR(2016).[code] - T-cnn: Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang,Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang.
" T-cnn: Tubelets with convolutional neural networks for object detection from videos". IEEE Transactions on Circuits and
Systems for Video Technology(2017).[code][翻译]
链接:https://pan.baidu.com/s/1dlMljp-if635zxUngIWFJA
提取码:1998
2和3算是一个作者写的,直接看第三篇就行,只能实现离线检测,因为他是先逐帧检测,检测完整个视频名然后把那些多余的得分低的检测框删掉,没有太多参考价值了。
2017
CVPR 2017
- DFF: Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei. “Deep Feature Flow for Video Recognition”. CVPR(2017). [code]
DFF和FGFA都是DAI Jifeng大牛的成果,用的是光流,代码框架用的是Mxnet。FGFA的精度提升了,但是速度变慢了。
ICCV 2017
- FGFA: Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei. “Flow-Guided Feature Aggregation for Video Object Detection”. ICCV(2017).code
- D&T: Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman. “Detect to Track and Track to Detect”. ICCV(2017).code
2018
ECCV 2018
-
STMN:Fanyi Xiao, Yong Jae Lee. “Video Object Detection with an Aligned Spatial-Temporal Memory”. ECCV(2018).
[code-lua语言][翻译]
链接:https://pan.baidu.com/s/1dlMljp-if635zxUngIWFJA
提取码:1998看不懂lua语言。。。。
-
STSN: Gedas Bertasius, Lorenzo Torresani, ianbo Shi. “Object Detection in Video with Spatiotemporal Sampling Networks”. ECCV(2018).(无代码)
-
MANet: Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng. “Fully Motion-Aware Network for Video Object Detection”. ECCV(2018).(无代码)
2019
AAAI 2019
-
LWDN: Zhengkai Jiang, Peng Gao, Chaoxu Guo, Qian Zhang, Shiming Xiang, Chunhong Pan. “Video Object Detection with Locally-Weighted Deformable Neighbors”. AAAI(2019).(无代码)
-
DorT: Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng. “Detect or Track: Towards Cost-Effective Video Object Detection/Tracking”. AAAI(2019).(无代码)
ICCV 2019
-
RDN: Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li,and Tao Mei. “Relation Distillation Networks for Video Object Detection”. ICCV(2019).[paper]
-
SELSA: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang.“Sequence Level Semantics Aggregation for Video Object Detection”.ICCV(2019). code
-
LLTR: Mykhailo Shvets, Wei Liu, Alexander C. Berg. “Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection”. ICCV(2019).[无代码]
-
OGEMN: Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue,Ruhui Ma, Neil Robertson, and Haibing Guan. “Object Guided External Memory Network for Video Object Detection”. ICCV(2019).[无代码]
-
PSLA: Chaoxu Guo, Bin Fan1, Jie Gu, Qian Zhang, Shiming Xiang,Veronique Prinet, Chunhong Pan1. “Progressive Sparse Local Attention for Video Object Detection”. ICCV(2019).[无代码]
-
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell: Huizi Mao, Xiaodong Yang, William J. Dally. “A Delay Metric for Video Object Detection: What Average Precision Fails to Tell”. ICCV(2019).[无代码]
2020.12.14更新,CVTA的。也是一篇顶会论文。19年7月发表的。
Seq-Bbox Matching:Belhassen H , Zhang H , Fresse V , et al. Improving Video Object Detection by Seq-Bbox Matching[C]// 14th International Conference on Computer Vision Theory and Applications. 2019.(也是后处理操作,但比T-CNN更快)
2020
CVPR 2020
- MEGA:Yihong Chen, Yue Cao, Han Hu, Liwei Wang. “Memory Enhanced Global-Local Aggregation for Video Object Detection”. CVPR(2020).code
链接: https://pan.baidu.com/s/1NrUTlikUo_Z3qRkb6mjInQ
提取码: f3ak
ECCV 2020
-
LSTS: Jiang, Zhengkai and Liu, Yu and Yang, Ceyuan and Liu, Jihao and Gao, Peng and Zhang, Qian and Xiang, Shiming and Pan, Chunhong.“Learning Where to Focus for Efficient Video Object Detection”.ECCV(2020). [代码]
MXNet框架。 -
OLTA: Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang. “Video Object Detection via Object-level Temporal Aggregation”. ECCV(2020). [无代码]
-
HVRNet: Mingfei Han, Yali Wang, Xiaojun Chang, and Yu Qiao Mining.“Mining Inter-Video Proposal Relations for Video Object Detection”.ECCV(2020). [无代码]
-
CHP: Zhujun Xu, Emir Hrustic, and DamienVivet. “CenterNet Heatma Propagation for Real-time Video Object Detection”. ECCV(2020). [无代码]
可以看到这么多论文,然而有代码的就一两篇,真的是惨啊!!!!