
Deep Residual Networks

Deep Residual Learning for Image Recognition 

Identity Mappings in Deep Residual Networks (by Kaiming He)

arxiv: http://arxiv.org/abs/1603.05027 
github: https://github.com/KaimingHe/resnet-1k-layers 
github: https://github.com/bazilas/matconvnet-ResNet 
github: https://github.com/FlorianMuellerklein/Identity-Mapping-ResNet-Lasagne

Wide Residual Networks

arxiv: http://arxiv.org/abs/1605.07146 
github: https://github.com/szagoruyko/wide-residual-networks 
github: https://github.com/asmith26/wide_resnets_keras

Inception-V4, Inception-Resnet And The Impact Of Residual Connections On Learning (Workshop track - ICLR 2016)

intro: “achieve 3.08% top-5 error on the test set of the ImageNet classification (CLS) challenge” 
arxiv: http://arxiv.org/abs/1602.07261 
paper: http://beta.openreview.net/pdf?id=q7kqBkL33f8LEkD3t7X9 
github: https://github.com/lim0606/torch-inception-resnet-v2

Object detection 
Object detection via a multi-region & semantic segmentation-aware CNN model 

DeepBox: Learning Objectness with Convolutional Networks ICCV2015 
proposal re-ranker 

Object-Proposal Evaluation Protocol is ‘Gameable’ 好多 Proposal 代码 

Fast R-CNN 

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 
https://github.com/ShaoqingRen/faster_rcnn MATLAB 
https://github.com/rbgirshick/py-faster-rcnn Python

YOLO : Real-Time Object Detection 

SSD: Single Shot MultiBox Detector 比Faster R-CNN又快又好啊! 

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection 

Image Question Answering 
Stacked Attention Networks for Image Question Answering CVPR2016 

Image Question Answering using Convolutional Neural Networ with Dynamic Parameter Prediction CVPR2016


SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust 
Semantic Pixel-Wise Labelling 

Learning to Track: Online Multi-Object Tracking by Decision Making ICCV2015 
使用 Markov Decision Processes 做跟踪,速度可能比较慢,效果应该还可以 

Fully-Convolutional Siamese Networks for Object Tracking 

Car detection: 
Integrating Context and Occlusion for Car Detection by Hierarchical And-or Model ECCV2014 

Face detection


Face detection without bells and whistles 
Talk: http://videolectures.net/eccv2014_mathias_face_detection/ (不错的报告)

From Facial Parts Responses to Face Detection: A Deep Learning Approach ICCV2015 email to get code and model 

A Fast and Accurate Unconstrained Face Detector 2015 PAMI 
简单 快速 有效 

Face Alignment 
Face Alignment by Coarse-to-Fine Shape Searching 

High-Fidelity Pose and Expression Normalization for Face Recognition 
in the Wild 

Face Recognition 
Deep face recognition 

Do We Really Need to Collect Millions of Faces for Effective Face Recognition? 

Person Re-identification :

Person Re-identification Results 

Learning a Discriminative Null Space for Person Re-identification 
code http://www.eecs.qmul.ac.uk/~lz/

Query-Adaptive Late Fusion for Image Search and Person Re-identification 

Efficient Person Re-identification by Hybrid Spatiogram and Covariance Descriptor CVPR2015 Workshops 

Person Re-Identification by Iterative Re-Weighted Sparse Ranking PAMI 2015 
http://www.micc.unifi.it/masi/code/isr-re-id/ 没有特征提取代码

Person re-identification by Local Maximal Occurrence representation and metric learning CVPR2015 

Head detection 
Context-aware CNNs for person head detection 
Matlab code & dataset avaiable 

Pedestrian detection

Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning PAMI 2015 
Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features ECCV2014 

Is Faster R-CNN Doing Well for Pedestrian Detection 
Matlab 代码 :https://github.com/zhangliliang/RPN_BF/tree/RPN-pedestrian

Deep Learning 
Deeply Learned Attributes for Crowded Scene Understanding 

Quantized Convolutional Neural Networks for Mobile Devices 

Human Pose Estimation 
DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR2014 
https://github.com/mitmul/deeppose not official implementation

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations NIPS 2014 

Learning Human Pose Estimation Features with Convolutional Networks 

Flowing ConvNets for Human Pose Estimation in Videos 

Unsupervised Learning of Visual Representations using Videos 很有前途啊! 

Learning Deep Representations of Fine-Grained Visual Descriptions 

Fast Detection of Curved Edges at Low SNR 

Unsupervised Processing of Vehicle Appearance for Automatic Understanding in Traffic Surveillance

code: https://medusa.fit.vutbr.cz/traffic/research-topics/fine-grained-vehicle-recognition/unsupervised-processing-of-vehicle-appearance-for-automatic-understanding-in-traffic-surveillance/

Image Retrieval 
Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks 

Deep Supervised Hashing for Fast Image Retrieval 

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification 

MPII Human Pose Dataset 

WIDER FACE: A Face Detection Benchmark 数据库

将voc-release4.0.1 linux 转到windows 


FASText: Efficient Unconstrained Scene Text Detector


一、特征提取Feature Extraction:


二、图像分割Image Segmentation:

  • Normalized Cut [1] [Matlab code]
  • Gerg Mori’ Superpixel code [2] [Matlab code]
  • Efficient Graph-based Image Segmentation [3] [C++ code] [Matlab wrapper]
  • Mean-Shift Image Segmentation [4] [EDISON C++ code] [Matlab wrapper]
  • OWT-UCM Hierarchical Segmentation [5] [Resources]
  • Turbepixels [6] [Matlab code 32bit] [Matlab code 64bit] [Updated code]
  • Quick-Shift [7] [VLFeat]
  • SLIC Superpixels [8] [Project]
  • Segmentation by Minimum Code Length [9] [Project]
  • Biased Normalized Cut [10] [Project]
  • Segmentation Tree [11-12] [Project]
  • Entropy Rate Superpixel Segmentation [13] [Code]
  • Fast Approximate Energy Minimization via Graph Cuts[Paper][Code]
  • Efficient Planar Graph Cuts with Applications in Computer Vision[Paper][Code]
  • Isoperimetric Graph Partitioning for Image Segmentation[Paper][Code]
  • Random Walks for Image Segmentation[Paper][Code]
  • Blossom V: A new implementation of a minimum cost perfect matching algorithm[Code]
  • An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Computer Vision[Paper][Code]
  • Geodesic Star Convexity for Interactive Image Segmentation[Project]
  • Contour Detection and Image Segmentation Resources[Project][Code]
  • Biased Normalized Cuts[Project]
  • Max-flow/min-cut[Project]
  • Chan-Vese Segmentation using Level Set[Project]
  • A Toolbox of Level Set Methods[Project]
  • Re-initialization Free Level Set Evolution via Reaction Diffusion[Project]
  • Improved C-V active contour model[Paper][Code]
  • A Variational Multiphase Level Set Approach to Simultaneous Segmentation and Bias Correction[Paper][Code]
  • Level Set Method Research by Chunming Li[Project]
  • ClassCut for Unsupervised Class Segmentation[code]
  • SEEDS: Superpixels Extracted via Energy-Driven Sampling [Project][other]


三、目标检测Object Detection:

  • A simple object detector with boosting [Project]
  • INRIA Object Detection and Localization Toolkit [1] [Project]
  • Discriminatively Trained Deformable Part Models [2] [Project]
  • Cascade Object Detection with Deformable Part Models [3] [Project]
  • Poselet [4] [Project]
  • Implicit Shape Model [5] [Project]
  • Viola and Jones’s Face Detection [6] [Project]
  • Bayesian Modelling of Dyanmic Scenes for Object Detection[Paper][Code]
  • Hand detection using multiple proposals[Project]
  • Color Constancy, Intrinsic Images, and Shape Estimation[Paper][Code]
  • Gradient Response Maps for Real-Time Detection of Texture-Less Objects: LineMOD [Project]
  • Image Processing On Line[Project]
  • Robust Optical Flow Estimation[Project]
  • Where's Waldo: Matching People in Images of Crowds[Project]
  • Scalable Multi-class Object Detection[Project]
  • Class-Specific Hough Forests for Object Detection[Project]
  • Deformed Lattice Detection In Real-World Images[Project]
四、显著性检测Saliency Detection:

  • Itti, Koch, and Niebur’ saliency detection [1] [Matlab code]
  • Frequency-tuned salient region detection [2] [Project]
  • Saliency detection using maximum symmetric surround [3] [Project]
  • Attention via Information Maximization [4] [Matlab code]
  • Context-aware saliency detection [5] [Matlab code]
  • Graph-based visual saliency [6] [Matlab code]
  • Saliency detection: A spectral residual approach. [7] [Matlab code]
  • Segmenting salient objects from images and videos. [8] [Matlab code]
  • Saliency Using Natural statistics. [9] [Matlab code]
  • Discriminant Saliency for Visual Recognition from Cluttered Scenes. [10] [Code]
  • Learning to Predict Where Humans Look [11] [Project]
  • Global Contrast based Salient Region Detection [12] [Project]
  • Bayesian Saliency via Low and Mid Level Cues[Project]
  • Top-Down Visual Saliency via Joint CRF and Dictionary Learning[Paper][Code]
  • Saliency Detection: A Spectral Residual Approach[Code]


五、图像分类、聚类Image Classification, Clustering

  • Pyramid Match [1] [Project]
  • Spatial Pyramid Matching [2] [Code]
  • Locality-constrained Linear Coding [3] [Project] [Matlab code]
  • Sparse Coding [4] [Project] [Matlab code]
  • Texture Classification [5] [Project]
  • Multiple Kernels for Image Classification [6] [Project]
  • Feature Combination [7] [Project]
  • SuperParsing [Code]
  • Large Scale Correlation Clustering Optimization[Matlab code]
  • Detecting and Sketching the Common[Project]
  • Self-Tuning Spectral Clustering[Project][Code]
  • User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior[Paper][Code]
  • Filters for Texture Classification[Project]
  • Multiple Kernel Learning for Image Classification[Project]
  • SLIC Superpixels[Project]


六、抠图Image Matting

  • A Closed Form Solution to Natural Image Matting [Code]
  • Spectral Matting [Project]
  • Learning-based Matting [Code]


七、目标跟踪Object Tracking:

  • A Forest of Sensors - Tracking Adaptive Background Mixture Models [Project]
  • Object Tracking via Partial Least Squares Analysis[Paper][Code]
  • Robust Object Tracking with Online Multiple Instance Learning[Paper][Code]
  • Online Visual Tracking with Histograms and Articulating Blocks[Project]
  • Incremental Learning for Robust Visual Tracking[Project]
  • Real-time Compressive Tracking[Project]
  • Robust Object Tracking via Sparsity-based Collaborative Model[Project]
  • Visual Tracking via Adaptive Structural Local Sparse Appearance Model[Project]
  • Online Discriminative Object Tracking with Local Sparse Representation[Paper][Code]
  • Superpixel Tracking[Project]
  • Learning Hierarchical Image Representation with Sparsity, Saliency and Locality[Paper][Code]
  • Online Multiple Support Instance Tracking [Paper][Code]
  • Visual Tracking with Online Multiple Instance Learning[Project]
  • Object detection and recognition[Project]
  • Compressive Sensing Resources[Project]
  • Robust Real-Time Visual Tracking using Pixel-Wise Posteriors[Project]
  • Tracking-Learning-Detection[Project][OpenTLD/C++ Code]
  • the HandVu:vision-based hand gesture interface[Project]
  • Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities[Project]





  • 3D Reconstruction of a Moving Object[Paper] [Code]
  • Shape From Shading Using Linear Approximation[Code]
  • Combining Shape from Shading and Stereo Depth Maps[Project][Code]
  • Shape from Shading: A Survey[Paper][Code]
  • A Spatio-Temporal Descriptor based on 3D Gradients (HOG3D)[Project][Code]
  • Multi-camera Scene Reconstruction via Graph Cuts[Paper][Code]
  • A Fast Marching Formulation of Perspective Shape from Shading under Frontal Illumination[Paper][Code]
  • Reconstruction:3D Shape, Illumination, Shading, Reflectance, Texture[Project]
  • Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers[Code]
  • Learning 3-D Scene Structure from a Single Still Image[Project]



  • Matlab class for computing Approximate Nearest Nieghbor (ANN) [Matlab class providing interface toANN library]
  • Random Sampling[code]
  • Probabilistic Latent Semantic Analysis (pLSA)[Code]
  • FASTANN and FASTCLUSTER for approximate k-means (AKM)[Project]
  • Fast Intersection / Additive Kernel SVMs[Project]
  • SVM[Code]
  • Ensemble learning[Project]
  • Deep Learning[Net]
  • Deep Learning Methods for Vision[Project]
  • Neural Network for Recognition of Handwritten Digits[Project]
  • Training a deep autoencoder or a classifier on MNIST digits[Project]
  • THE MNIST DATABASE of handwritten digits[Project]
  • Ersatz:deep neural networks in the cloud[Project]
  • Deep Learning [Project]
  • sparseLM : Sparse Levenberg-Marquardt nonlinear least squares in C/C++[Project]
  • Weka 3: Data Mining Software in Java[Project]
  • Invited talk "A Tutorial on Deep Learning" by Dr. Kai Yu (余凯)[Video]
  • CNN - Convolutional neural network class[Matlab Tool]
  • Yann LeCun's Publications[Wedsite]
  • LeNet-5, convolutional neural networks[Project]
  • Training a deep autoencoder or a classifier on MNIST digits[Project]
  • Deep Learning 大牛Geoffrey E. Hinton's HomePage[Website]
  • Multiple Instance Logistic Discriminant-based Metric Learning (MildML) and Logistic Discriminant-based Metric Learning (LDML)[Code]
  • Sparse coding simulation software[Project]
  • Visual Recognition and Machine Learning Summer School[Software]


十一、目标、行为识别Object, Action Recognition:

  • Action Recognition by Dense Trajectories[Project][Code]
  • Action Recognition Using a Distributed Representation of Pose and Appearance[Project]
  • Recognition Using Regions[Paper][Code]
  • 2D Articulated Human Pose Estimation[Project]
  • Fast Human Pose Estimation Using Appearance and Motion via Multi-Dimensional Boosting Regression[Paper][Code]
  • Estimating Human Pose from Occluded Images[Paper][Code]
  • Quasi-dense wide baseline matching[Project]
  • ChaLearn Gesture Challenge: Principal motion: PCA-based reconstruction of motion histograms[Project]
  • Real Time Head Pose Estimation with Random Regression Forests[Project]
  • 2D Action Recognition Serves 3D Human Pose Estimation[Project]
  • A Hough Transform-Based Voting Framework for Action Recognition[Project]
  • Motion Interchange Patterns for Action Recognition in Unconstrained Videos[Project]
  • 2D articulated human pose estimation software[Project]
  • Learning and detecting shape models [code]
  • Progressive Search Space Reduction for Human Pose Estimation[Project]
  • Learning Non-Rigid 3D Shape from 2D Motion[Project]



  • Distance Transforms of Sampled Functions[Project]
  • The Computer Vision Homepage[Project]
  • Efficient appearance distances between windows[code]
  • Image Exploration algorithm[code]
  • Motion Magnification 运动放大 [Project]
  • Bilateral Filtering for Gray and Color Images 双边滤波器 [Project]
  • A Fast Approximation of the Bilateral Filter using a Signal Processing Approach [Project]



  • EGT: a Toolbox for Multiple View Geometry and Visual Servoing[Project] [Code]
  • a development kit of matlab mex functions for OpenCV library[Project]
  • Fast Artificial Neural Network Library[Project]



  • finger-detection-and-gesture-recognition [Code]
  • Hand and Finger Detection using JavaCV[Project]
  • Hand and fingers detection[Code]



  • Nonparametric Scene Parsing via Label Transfer [Project]


十六、光流Optical flow:

  • High accuracy optical flow using a theory for warping [Project]
  • Dense Trajectories Video Description [Project]
  • SIFT Flow: Dense Correspondence across Scenes and its Applications[Project]
  • KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker [Project]
  • Tracking Cars Using Optical Flow[Project]
  • Secrets of optical flow estimation and their principles[Project]
  • implmentation of the Black and Anandan dense optical flow method[Project]
  • Optical Flow Computation[Project]
  • Beyond Pixels: Exploring New Representations and Applications for Motion Analysis[Project]
  • A Database and Evaluation Methodology for Optical Flow[Project]
  • optical flow relative[Project]
  • Robust Optical Flow Estimation [Project]
  • optical flow[Project]


十七、图像检索Image Retrieval

  • Semi-Supervised Distance Metric Learning for Collaborative Image Retrieval [Paper][code]


十八、马尔科夫随机场Markov Random Fields:

  • Markov Random Fields for Super-Resolution [Project]
  • A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors [Project]


十九、运动检测Motion detection:

  • Moving Object Extraction, Using Models or Analysis of Regions [Project]
  • Background Subtraction: Experiments and Improvements for ViBe [Project]
  • A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications [Project]
  • changedetection.net: A new change detection benchmark dataset[Project]
  • ViBe - a powerful technique for background detection and subtraction in video sequences[Project]
  • Background Subtraction Program[Project]
  • Motion Detection Algorithms[Project]
  • Stuttgart Artificial Background Subtraction Dataset[Project]
  • Object Detection, Motion Estimation, and Tracking[Project]


Feature Detection and Description

General Libraries: 

  • VLFeat – Implementation of various feature descriptors (including SIFT, HOG, and LBP) and covariant feature detectors (including DoG, Hessian, Harris Laplace, Hessian Laplace, Multiscale Hessian, Multiscale Harris). Easy-to-use Matlab interface. See Modern features: Software – Slides providing a demonstration of VLFeat and also links to other software. Check also VLFeat hands-on session training
  • OpenCV – Various implementations of modern feature detectors and descriptors (SIFT, SURF, FAST, BRIEF, ORB, FREAK, etc.)


Fast Keypoint Detectors for Real-time Applications: 

  • FAST – High-speed corner detector implementation for a wide variety of platforms
  • AGAST – Even faster than the FAST corner detector. A multi-scale version of this method is used for the BRISK descriptor (ECCV 2010).


Binary Descriptors for Real-Time Applications: 

  • BRIEF – C++ code for a fast and accurate interest point descriptor (not invariant to rotations and scale) (ECCV 2010)
  • ORB – OpenCV implementation of the Oriented-Brief (ORB) descriptor (invariant to rotations, but not scale)
  • BRISK – Efficient Binary descriptor invariant to rotations and scale. It includes a Matlab mex interface. (ICCV 2011)
  • FREAK – Faster than BRISK (invariant to rotations and scale) (CVPR 2012)


SIFT and SURF Implementations: 


Other Local Feature Detectors and Descriptors: 

  • VGG Affine Covariant features – Oxford code for various affine covariant feature detectors and descriptors.
  • LIOP descriptor – Source code for the Local Intensity order Pattern (LIOP) descriptor (ICCV 2011).
  • Local Symmetry Features – Source code for matching of local symmetry features under large variations in lighting, age, and rendering style (CVPR 2012).


Global Image Descriptors: 

  • GIST – Matlab code for the GIST descriptor
  • CENTRIST – Global visual descriptor for scene categorization and object detection (PAMI 2011)


Feature Coding and Pooling 

  • VGG Feature Encoding Toolkit – Source code for various state-of-the-art feature encoding methods – including Standard hard encoding, Kernel codebook encoding, Locality-constrained linear encoding, and Fisher kernel encoding.
  • Spatial Pyramid Matching – Source code for feature pooling based on spatial pyramid matching (widely used for image classification)


Convolutional Nets and Deep Learning 

  • EBLearn – C++ Library for Energy-Based Learning. It includes several demos and step-by-step instructions to train classifiers based on convolutional neural networks.
  • Torch7 – Provides a matlab-like environment for state-of-the-art machine learning algorithms, including a fast implementation of convolutional neural networks.
  • Deep Learning - Various links for deep learning software.


Part-Based Models 


Attributes and Semantic Features 


Large-Scale Learning 

  • Additive Kernels – Source code for fast additive kernel SVM classifiers (PAMI 2013).
  • LIBLINEAR – Library for large-scale linear SVM classification.
  • VLFeat – Implementation for Pegasos SVM and Homogeneous Kernel map.


Fast Indexing and Image Retrieval 

  • FLANN – Library for performing fast approximate nearest neighbor.
  • Kernelized LSH – Source code for Kernelized Locality-Sensitive Hashing (ICCV 2009).
  • ITQ Binary codes – Code for generation of small binary codes using Iterative Quantization and other baselines such as Locality-Sensitive-Hashing (CVPR 2011).
  • INRIA Image Retrieval – Efficient code for state-of-the-art large-scale image retrieval (CVPR 2011).


Object Detection 


3D Recognition 


Action Recognition 





  • Animals with Attributes – 30,475 images of 50 animals classes with 6 pre-extracted feature representations for each image.
  • aYahoo and aPascal – Attribute annotations for images collected from Yahoo and Pascal VOC 2008.
  • FaceTracer – 15,000 faces annotated with 10 attributes and fiducial points.
  • PubFig – 58,797 face images of 200 people with 73 attribute classifier outputs.
  • LFW – 13,233 face images of 5,749 people with 73 attribute classifier outputs.
  • Human Attributes – 8,000 people with annotated attributes. Check also this link for another dataset of human attributes.
  • SUN Attribute Database – Large-scale scene attribute database with a taxonomy of 102 attributes.
  • ImageNet Attributes – Variety of attribute labels for the ImageNet dataset.
  • Relative attributes – Data for OSR and a subset of PubFig datasets. Check also this link for the WhittleSearch data.
  • Attribute Discovery Dataset – Images of shopping categories associated with textual descriptions.


Fine-grained Visual Categorization 


Face Detection 

  • FDDB – UMass face detection dataset and benchmark (5,000+ faces)
  • CMU/MIT – Classical face detection dataset.


Face Recognition 

  • Face Recognition Homepage – Large collection of face recognition datasets.
  • LFW – UMass unconstrained face recognition dataset (13,000+ face images).
  • NIST Face Homepage – includes face recognition grand challenge (FRGC), vendor tests (FRVT) and others.
  • CMU Multi-PIE – contains more than 750,000 images of 337 people, with 15 different views and 19 lighting conditions.
  • FERET – Classical face recognition dataset.
  • Deng Cai’s face dataset in Matlab Format – Easy to use if you want play with simple face datasets including Yale, ORL, PIE, and Extended Yale B.
  • SCFace – Low-resolution face dataset captured from surveillance cameras.


Handwritten Digits 

  • MNIST – large dataset containing a training set of 60,000 examples, and a test set of 10,000 examples.


Pedestrian Detection


Generic Object Recognition 

  • ImageNet – Currently the largest visual recognition dataset in terms of number of categories and images.
  • Tiny Images – 80 million 32x32 low resolution images.
  • Pascal VOC – One of the most influential visual recognition datasets.
  • Caltech 101 / Caltech 256 – Popular image datasets containing 101 and 256 object categories, respectively.
  • MIT LabelMe – Online annotation tool for building computer vision databases.


Scene Recognition


Feature Detection and Description 


Action Recognition


RGBD Recognition 






