深度学习在CV领域的进展以及一些由深度学习演变的新技术

最新推荐文章于 2024-01-31 10:01:05 发布

技术挖掘者

最新推荐文章于 2024-01-31 10:01:05 发布

阅读量1.4w

点赞数 30

分类专栏：深度学习 CV领域文章标签：深度学习 CV领域

本文链接：https://blog.csdn.net/WZZ18191171661/article/details/70161595

版权

深度学习同时被 2 个专栏收录

6 篇文章 5 订阅

订阅专栏

CV领域

3 篇文章 2 订阅

订阅专栏

CV领域

这里写图片描述

1.进展：如上图所述，当前CV领域主要包括两个大的方向，”低层次的感知” 和 “高层次的认知”。

2.主要的应用领域：视频监控、人脸识别、医学图像分析、自动驾驶、机器人、AR、VR

3.主要的技术：分类、目标检测（识别)、分割、目标追踪、边缘检测、姿势评估、理解CNN、超分辨率重建、序列学习、特征检测与匹配、图像标定，视频标定、问答系统、图片生成（文本生成图像）、视觉关注性和显著性（质量评价）、人脸识别、3D重建、推荐系统、细粒度图像分析、图像压缩

分类主要需要解决的问题是“我是谁？”
目标检测主要需要解决的问题是“我是谁？我在哪里？”
分割主要需要解决的问题是“我是谁？我在哪里？你是否能够正确分割我？”
目标追踪主要需要解决的问题是“你能不能跟上我的步伐，尽快找到我？”
边缘检测主要需要解决的问题是：“如何准确的检测到目标的边缘？”
人体姿势评估主要需要解决的问题是：“你需要通过我的姿势判断我在干什么？”
理解CNN主要需要解决的问题是：“从理论上深层次的去理解CNN的原理？”
超分辨率重建主要需要解决的问题是：“你如何从低质量图片获得高质量的图片？”
序列学习主要解决的问题是“你知道我的下一幅图像或者下一帧视频是什么吗？”
特征检测与匹配主要需要解决的问题是“检测图像的特征，判断相似程度？”
图像标定主要需要解决的问题是“你能说出图像中有什么东西？他们在干什么呢？”
视频标定主要需要解决的问题是“你知道我这几帧视频说明了什么吗？”
问答系统主要需要解决的问题是：“你能根据图像正确回答我提问的问题吗？”
图片生成主要需要解决的问题是：“我能通过你给的信息准确的生成对应的图片？”
视觉关注性和显著性主要需要解决的问题是：“如何提出模拟人类视觉注意机制的模型？”
人脸识别主要需要解决的问题是：“机器如何准确的识别出同一个人在不同情况下的脸？”
3D重建主要需要解决的问题是“你能通过我给你的图片生成对应的高质量3D点云吗？”
推荐系统主要需要解决的问题是“你能根据我的输入给出准确的输出吗？”
细粒度图像分析主要需要解决的问题是“你能辨别出我是哪一种狗吗？等这些更精细的任务”
图像压缩主要需要解决的问题是“如何以较少的比特有损或者无损的表示原来的图像？”

注：
1. 以下我主要从CV领域中的各个小的领域入手，总结该领域中一些网络模型，基本上覆盖到了各个领域，力求完整的收集各种经典的模型，顺序基本上是按照时间的先后，一般最后是该领域最新提出来的方案，我主要的目的是做一个整理，方便自己和他人的使用，你不再需要去网上收集大把的资料，需要的是仔细分析这些模型，并提出自己新的模型。这里面收集的论文质量都比较高，主要来自于ECCV、ICCV、CVPR、PAM、arxiv、ICLR、ACM等顶尖国际会议。并且为每篇论文都添加了链接。可以大大地节约你的时间。同时，我挑选出论文比较重要的网络模型或者整体架构，可以方便你去进行对比。有一个更好的全局观。具体细节需要你去仔细的阅读论文。由于个人的精力有限，我只能做成这样，希望大家能够理解。谢谢。
2. 我会利用自己的业余时间来更新新的模型，但是由于时间和精力有限，可能并不完整，我希望大家都能贡献的一份力量，如果你发现新的模型，可以联系我，我会及时回复大家，期待着的加入，让我们一起服务大家！

如下图所示：
这里写图片描述

分类：这是一个基础的研究课题，已经获得了很高的准确率，在一些场合上面已经远远地超过啦人类啦！

典型的网络模型

LeNet
http://yann.lecun.com/exdb/lenet/index.html
AlexNet
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
https://arxiv.org/pdf/1502.01852.pdf
Batch Normalization
https://arxiv.org/pdf/1502.03167.pdf
GoogLeNet
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
VGGNet
https://arxiv.org/pdf/1409.1556.pdf
ResNet
https://arxiv.org/pdf/1512.03385.pdf
InceptionV4（Inception-ResNet）
https://arxiv.org/pdf/1602.07261.pdf

LeNet网络1：
这里写图片描述

LeNet网络2：
这里写图片描述

AlexNet网络1：
这里写图片描述

AlexNet网络2：
这里写图片描述

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification网络：
这里写图片描述

GoogLeNet网络1：
这里写图片描述

GoogLeNet网络2：
这里写图片描述

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification网络：
这里写图片描述

Batch Normalization：
这里写图片描述
VGGNet网络1：

VGGNet网络2：
这里写图片描述

ResNet网络：
这里写图片描述

InceptionV4网络：
这里写图片描述

图像检测：这是基于图像分类的基础上所做的一些研究，即分类+定位。

典型网络

OVerfeat
https://arxiv.org/pdf/1312.6229.pdf
RNN
https://arxiv.org/pdf/1311.2524.pdf
SPP-Net
https://arxiv.org/pdf/1406.4729.pdf
DeepID-Net
https://arxiv.org/pdf/1409.3505.pdf
Fast R-CNN
https://arxiv.org/pdf/1504.08083.pdf
R-CNN minus R
https://arxiv.org/pdf/1506.06981.pdf
End-to-end people detection in crowded scenes
https://arxiv.org/pdf/1506.04878.pdf
DeepBox
https://arxiv.org/pdf/1505.02146.pdf
MR-CNN
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf
Faster R-CNN
https://arxiv.org/pdf/1506.01497.pdf
YOLO
https://arxiv.org/pdf/1506.02640.pdf
DenseBox
https://arxiv.org/pdf/1509.04874.pdf
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
https://arxiv.org/pdf/1503.00949.pdf
R-FCN
https://arxiv.org/pdf/1605.06409.pdf
SSD
https://arxiv.org/pdf/1512.02325v2.pdf
Inside-Outside Net
https://arxiv.org/pdf/1512.04143.pdf
G-CNN
https://arxiv.org/pdf/1512.07729.pdf
PVANET
https://arxiv.org/pdf/1608.08021.pdf
Speed/accuracy trade-offs for modern convolutional object detectors
https://arxiv.org/pdf/1611.10012v1.pdf

OVerfeat网络：
这里写图片描述

R-CNN网络：
这里写图片描述

SPP-Net网络：
这里写图片描述

DeepID-Net网络：
这里写图片描述

DeepBox网络：
这里写图片描述

MR-CNN网络：
这里写图片描述

Fast-RCNN网络：
这里写图片描述

R-CNN minus R网络：
这里写图片描述

End-to-end people detection in crowded scenes网络：
这里写图片描述

Faster-RCNN网络：
这里写图片描述

DenseBox网络：
这里写图片描述

Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning网络：
这里写图片描述

R-FCN网络：
这里写图片描述

YOLO和SDD网络：
这里写图片描述

Inside-Outside Net网络：
这里写图片描述

G-CNN网络：
这里写图片描述

PVANET网络：
这里写图片描述

Speed/accuracy trade-offs for modern convolutional object detectors：
这里写图片描述

图像分割

经典网络模型：

FCN
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
segNet
https://arxiv.org/pdf/1511.00561.pdf
Deeplab
https://arxiv.org/pdf/1606.00915.pdf
deconvNet
https://arxiv.org/pdf/1505.04366.pdf
Conditional Random Fields as Recurrent Neural Networks
http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf
Semantic Segmentation using Adversarial Networks
https://arxiv.org/pdf/1611.08408.pdf
SEC: Seed, Expand and Constrain：
http://pub.ist.ac.at/~akolesnikov/files/ECCV2016/main.pdf
Efficient piecewise training of deep structured models for semantic segmentation
https://arxiv.org/pdf/1504.01013.pdf
Semantic Image Segmentation via Deep Parsing Network
https://arxiv.org/pdf/1509.02634.pdf
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
https://arxiv.org/pdf/1503.01640.pdf
Learning Deconvolution Network for Semantic Segmentation
https://arxiv.org/pdf/1505.04366.pdf
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
https://arxiv.org/pdf/1506.04924.pdf
PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING
https://arxiv.org/pdf/1511.07386.pdf
Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network
https://arxiv.org/pdf/1512.07928.pdf
Feedforward Semantic Segmentation With Zoom-Out Features
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf
Joint Calibration for Semantic Segmentation
https://arxiv.org/pdf/1507.01581.pdf
Hypercolumns for Object Segmentation and Fine-Grained Localization
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Hariharan_Hypercolumns_for_Object_2015_CVPR_paper.pdf
Scene Parsing with Multiscale Feature Learning
http://yann.lecun.com/exdb/publis/pdf/farabet-icml-12.pdf
Learning Hierarchical Features for Scene Labeling
http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf
Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Izadinia_Segment-Phrase_Table_for_ICCV_2015_paper.pdf
MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS
https://arxiv.org/pdf/1511.07122v2.pdf
Weakly supervised graph based semantic segmentation by learning communities of image-parts
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Pourian_Weakly_Supervised_Graph_ICCV_2015_paper.pdf

FCN网络1：
这里写图片描述

FCN网络2：
这里写图片描述

segNet网络：
这里写图片描述

Deeplab网络：
这里写图片描述

deconvNet网络：
这里写图片描述

Conditional Random Fields as Recurrent Neural Networks网络：
这里写图片描述

Semantic Segmentation using Adversarial Networks网络：
这里写图片描述

SEC: Seed, Expand and Constrain网络：
这里写图片描述

Efficient piecewise training of deep structured models for semantic segmentation网络：
这里写图片描述

Semantic Image Segmentation via Deep Parsing Network网络：
这里写图片描述

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation：
这里写图片描述

Learning Deconvolution Network for Semantic Segmentation：
这里写图片描述

PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING：
这里写图片描述

Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation：
这里写图片描述

Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network：
这里写图片描述

Feedforward Semantic Segmentation With Zoom-Out Features网络：
这里写图片描述

Joint Calibration for Semantic Segmentation：
这里写图片描述

Hypercolumns for Object Segmentation and Fine-Grained Localization：
这里写图片描述

Learning Hierarchical Features for Scene Labeling：
这里写图片描述

MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS：
这里写图片描述

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing：
这里写图片描述

Weakly supervised graph based semantic segmentation by learning communities of image-parts：
这里写图片描述

Scene Parsing with Multiscale Feature Learning：
这里写图片描述

目标追踪

经典网络：

DLT
https://pdfs.semanticscholar.org/b218/0fc4f5cb46b5b5394487842399c501381d67.pdf
Transferring Rich Feature Hierarchies for Robust Visual Tracking
https://arxiv.org/pdf/1501.04587.pdf
FCNT
http://202.118.75.4/lu/Paper/ICCV2015/iccv15_lijun.pdf
Hierarchical Convolutional Features for Visual Tracking
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Ma_Hierarchical_Convolutional_Features_ICCV_2015_paper.pdf
MDNet
https://arxiv.org/pdf/1510.07945.pdf
Recurrently Target-Attending Tracking
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Cui_Recurrently_Target-Attending_Tracking_CVPR_2016_paper.pdf
DeepTracking
http://www.bmva.org/bmvc/2014/files/paper028.pdf
DeepTrack
http://www.bmva.org/bmvc/2014/files/paper028.pdf
Online Tracking by Learning Discriminative Saliency Map
with Convolutional Neural Network
https://arxiv.org/pdf/1502.06796.pdf
Transferring Rich Feature Hierarchies for Robust Visual Tracking
https://arxiv.org/pdf/1501.04587.pdf

DLT网络：
这里写图片描述

Transferring Rich Feature Hierarchies for Robust Visual Tracking网络：
这里写图片描述

FCNT网络：
这里写图片描述

Hierarchical Convolutional Features for Visual Tracking网络：
这里写图片描述

MDNet网络：
这里写图片描述

DeepTracking网络：
这里写图片描述

ecurrently Target-Attending Tracking网络：
这里写图片描述

DeepTrack网络：
这里写图片描述

Online Tracking by Learning Discriminative Saliency Map
with Convolutional Neural Network：
这里写图片描述

边缘检测

经典模型：

HED
https://arxiv.org/pdf/1504.06375.pdf
DeepEdge
https://arxiv.org/pdf/1412.1123.pdf
DeepConto
http://mc.eistar.net/UpLoadFiles/Papers/DeepContour_cvpr15.pdf

HED网络：
这里写图片描述

DeepEdge网络：
这里写图片描述

DeepContour网络：
这里写图片描述

人体姿势评估

经典模型：

DeepPose
https://arxiv.org/pdf/1312.4659.pdf
JTCN
https://www.robots.ox.ac.uk/~vgg/rg/papers/tompson2014.pdf
Flowing convnets for human pose estimation in videos
https://arxiv.org/pdf/1506.02897.pdf
Stacked hourglass networks for human pose estimation
https://arxiv.org/pdf/1603.06937.pdf
Convolutional pose machines
https://arxiv.org/pdf/1602.00134.pdf
Deepcut
https://arxiv.org/pdf/1605.03170.pdf
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
https://arxiv.org/pdf/1611.08050.pdf

DeepPose网络：
这里写图片描述

JTCN网络：
这里写图片描述

Flowing convnets for human pose estimation in videos网络：
这里写图片描述

Stacked hourglass networks for human pose estimation网络：
这里写图片描述

Convolutional pose machines网络：
这里写图片描述

Deepcut网络：
这里写图片描述

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields网络：
这里写图片描述

理解CNN

经典网络：

Visualizing and Understanding Convolutional Networks
https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf
Inverting Visual Representations with Convolutional Networks
https://arxiv.org/pdf/1506.02753.pdf
Object Detectors Emerge in Deep Scene CNNs
https://arxiv.org/pdf/1412.6856.pdf
Understanding Deep Image Representations by Inverting Them
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf
Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf
Understanding image representations by measuring their equivariance and equivalence
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

Visualizing and Understanding Convolutional Networks网络：
这里写图片描述

Inverting Visual Representations with Convolutional Networks：
这里写图片描述

Object Detectors Emerge in Deep Scene CNNs：
这里写图片描述

Understanding Deep Image Representations by Inverting Them：
这里写图片描述

Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images：
这里写图片描述

Understanding image representations by measuring their equivariance and equivalence：
这里写图片描述

超分辨率重建

经典模型：

Learning Iterative Image Reconstruction
http://www.ais.uni-bonn.de/behnke/papers/ijcai01.pdf
Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid
http://www.ais.uni-bonn.de/behnke/papers/ijcia01.pdf
Learning a Deep Convolutional Network for Image Super-Resolution
http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepresolution.pdf
Image Super-Resolution Using Deep Convolutional Networks
https://arxiv.org/pdf/1501.00092.pdf
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
https://arxiv.org/pdf/1511.04587.pdf
Deeply-Recursive Convolutional Network for Image Super-Resolution
https://arxiv.org/pdf/1511.04491.pdf
Deep Networks for Image Super-Resolution with Sparse Prior
http://www.ifp.illinois.edu/~dingliu2/iccv15/iccv15.pdf
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
https://arxiv.org/pdf/1603.08155.pdf
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
https://arxiv.org/pdf/1609.04802v3.pdf

Learning Iterative Image Reconstruction网络：
这里写图片描述

Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid：
这里写图片描述

Learning a Deep Convolutional Network for Image Super-Resolution：
这里写图片描述

Image Super-Resolution Using Deep Convolutional Networks：
这里写图片描述

Accurate Image Super-Resolution Using Very Deep Convolutional Networks：
这里写图片描述

Deeply-Recursive Convolutional Network for Image Super-Resolution：
这里写图片描述

Deep Networks for Image Super-Resolution with Sparse Prior：
这里写图片描述

Perceptual Losses for Real-Time Style Transfer and Super-Resolution：
这里写图片描述

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network：
这里写图片描述

图像标定

经典模型：

Explain Images with Multimodal Recurrent Neural Networks
https://arxiv.org/pdf/1410.1090.pdf
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
https://arxiv.org/pdf/1411.2539.pdf
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
https://arxiv.org/pdf/1411.4389.pdf
A Neural Image Caption Generator
https://arxiv.org/pdf/1411.4555.pdf
Deep Visual-Semantic Alignments for Generating Image Description
http://cs.stanford.edu/people/karpathy/cvpr2015.pdf
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
https://arxiv.org/pdf/1412.4729.pdf
Learning a Recurrent Visual Representation for Image Caption Generation
https://arxiv.org/pdf/1411.5654.pdf
From Captions to Visual Concepts and Back
https://arxiv.org/pdf/1411.4952.pdf
Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention
http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf
Phrase-based Image Captioning
https://arxiv.org/pdf/1502.03671.pdf
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
https://arxiv.org/pdf/1504.06692.pdf
Exploring Nearest Neighbor Approaches for Image Captioning
https://arxiv.org/pdf/1505.04467.pdf
Image Captioning with an Intermediate Attributes Layer
https://arxiv.org/pdf/1506.01144.pdf
Learning language through pictures
https://arxiv.org/pdf/1506.03694.pdf
Describing Multimedia Content using Attention-based Encoder-Decoder Networks
https://arxiv.org/pdf/1507.01053.pdf
Image Representations and New Domains in Neural Image Captioning
https://arxiv.org/pdf/1508.02091.pdf
Learning Query and Image Similarities with Ranking Canonical Correlation Analysis
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Learning_Query_and_ICCV_2015_paper.pdf
Generative Adversarial Text to Image Synthesis
https://arxiv.org/pdf/1605.05396.pdf
GENERATING IMAGES FROM CAPTIONS WITH ATTENTION
https://arxiv.org/pdf/1511.02793.pdf

Explain Images with Multimodal Recurrent Neural Networks：
这里写图片描述

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models：
这里写图片描述

Long-term Recurrent Convolutional Networks for Visual Recognition and Description：
这里写图片描述

A Neural Image Caption Generator：
这里写图片描述

Deep Visual-Semantic Alignments for Generating Image Description：
这里写图片描述

Translating Videos to Natural Language Using Deep Recurrent Neural Networks：
这里写图片描述

Learning a Recurrent Visual Representation for Image Caption Generation：
这里写图片描述

From Captions to Visual Concepts and Back：
这里写图片描述

Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention：
这里写图片描述

Phrase-based Image Captioning：
这里写图片描述

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images：
这里写图片描述

Exploring Nearest Neighbor Approaches for Image Captioning：
这里写图片描述

Image Captioning with an Intermediate Attributes Layer：
这里写图片描述

Learning language through pictures：
这里写图片描述

Describing Multimedia Content using Attention-based Encoder-Decoder Networks：
这里写图片描述

Image Representations and New Domains in Neural Image Captioning：
这里写图片描述

Learning Query and Image Similarities with Ranking Canonical Correlation Analysis：
这里写图片描述

Generative Adversarial Text to Image Synthesis：
这里写图片描述

GENERATING IMAGES FROM CAPTIONS WITH ATTENTION：
这里写图片描述

视频标注

经典模型：

Long-term Recurrent Convolutional Networks for Visual Recognition and Description
https://arxiv.org/pdf/1411.4389.pdf
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
https://arxiv.org/pdf/1412.4729.pdf
Joint Modeling Embedding and Translation to Bridge Video and Language
https://arxiv.org/pdf/1505.01861.pdf
Sequence to Sequence–Video to Text
https://arxiv.org/pdf/1505.00487.pdf
Describing Videos by Exploiting Temporal Structure
https://arxiv.org/pdf/1502.08029.pdf
The Long-Short Story of Movie Description
https://arxiv.org/pdf/1506.01698.pdf
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
https://arxiv.org/pdf/1506.06724.pdf
Describing Multimedia Content using Attention-based Encoder-Decoder Networks
https://arxiv.org/pdf/1507.01053.pdf
Temporal Tessellation for Video Annotation and Summarization
https://arxiv.org/pdf/1612.06950.pdf
Summarization-based Video Caption via Deep Neural Networks
acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78”>http://delivery.acm.org/10.1145/2810000/2806314/p1191-li.pdf?ip=123.138.79.12&id=2806314&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78
Deep Learning for Video Classification and Captioning
https://arxiv.org/pdf/1609.06782.pdf

Long-term Recurrent Convolutional Networks for Visual Recognition and Description：
这里写图片描述

Translating Videos to Natural Language Using Deep Recurrent Neural Networks：
这里写图片描述

Joint Modeling Embedding and Translation to Bridge Video and Language：
这里写图片描述

Sequence to Sequence–Video to Text：
这里写图片描述

Describing Videos by Exploiting Temporal Structure:
这里写图片描述

The Long-Short Story of Movie Description：
这里写图片描述

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books：
这里写图片描述

Describing Multimedia Content using Attention-based Encoder-Decoder Networks：
这里写图片描述

Temporal Tessellation for Video Annotation and Summarization：
这里写图片描述

Summarization-based Video Caption via Deep Neural Networks：
这里写图片描述

Deep Learning for Video Classification and Captioning：
这里写图片描述

问答系统

经典模型：

VQA: Visual Question Answering
https://arxiv.org/pdf/1505.00468.pdf
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
https://arxiv.org/pdf/1505.01121.pdf
Image Question Answering: A Visual Semantic Embedding Model and a New Dataset
https://arxiv.org/pdf/1505.02074.pdf
Stacked Attention Networks for Image Question Answering
https://arxiv.org/pdf/1511.02274v2.pdf
Dataset and Methods for Multilingual Image Question Answering
https://arxiv.org/pdf/1505.05612.pdf
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Dynamic Memory Networks for Visual and Textual Question Answering
https://arxiv.org/pdf/1603.01417v1.pdf
Multimodal Residual Learning for Visual QA
https://arxiv.org/pdf/1606.01455.pdf
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
https://arxiv.org/pdf/1606.01847.pdf
Training Recurrent Answering Units with Joint Loss Minimization for VQA
https://arxiv.org/pdf/1606.03647.pdf
Hadamard Product for Low-rank Bilinear Pooling
https://arxiv.org/pdf/1610.04325.pdf
Question Answering Using Deep Learning
https://cs224d.stanford.edu/reports/StrohMathur.pdf

VQA: Visual Question Answering：
这里写图片描述

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images：
这里写图片描述

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset：
这里写图片描述

Stacked Attention Networks for Image Question Answering：
这里写图片描述

Dataset and Methods for Multilingual Image Question Answering：
这里写图片描述

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction：
这里写图片描述

Dynamic Memory Networks for Visual and Textual Question Answering：
这里写图片描述

Multimodal Residual Learning for Visual QA：
这里写图片描述

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding：
这里写图片描述

Training Recurrent Answering Units with Joint Loss Minimization for VQA：
这里写图片描述

Hadamard Product for Low-rank Bilinear Pooling：
这里写图片描述

Question Answering Using Deep Learning：
这里写图片描述

图片生成（CNN、RNN、LSTM、GAN）

经典模型：

Conditional Image Generation with PixelCNN Decoders
https://arxiv.org/pdf/1606.05328v2.pdf
Learning to Generate Chairs with Convolutional Neural Networks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf
DRAW: A Recurrent Neural Network For Image Generation
https://arxiv.org/pdf/1502.04623v2.pdf
Generative Adversarial Networks
https://arxiv.org/pdf/1406.2661.pdf
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
https://arxiv.org/pdf/1506.05751.pdf
A note on the evaluation of generative models
https://arxiv.org/pdf/1511.01844.pdf
Variationally Auto-Encoded Deep Gaussian Processes
https://arxiv.org/pdf/1511.06455v2.pdf
Generating Images from Captions with Attention
https://arxiv.org/pdf/1511.02793v2.pdf
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
https://arxiv.org/pdf/1511.06390v1.pdf
Censoring Representations with an Adversary
https://arxiv.org/pdf/1511.05897v3.pdf
Distributional Smoothing with Virtual Adversarial Training
https://arxiv.org/pdf/1507.00677v8.pdf
Generative Visual Manipulation on the Natural Image Manifold
https://arxiv.org/pdf/1609.03552v2.pdf
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
https://arxiv.org/pdf/1511.06434.pdf
Wasserstein GAN
https://arxiv.org/pdf/1701.07875.pdf
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
https://arxiv.org/pdf/1701.06264.pdf
Conditional Generative Adversarial Nets
https://arxiv.org/pdf/1411.1784.pdf
InfoGAN: Interpretable Representation Learning byInformation Maximizing Generative Adversarial Nets
https://arxiv.org/pdf/1606.03657.pdf
Conditional Image Synthesis With Auxiliary Classifier GANs
https://arxiv.org/pdf/1610.09585.pdf
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
https://arxiv.org/pdf/1609.05473.pdf
Improved Training of Wasserstein GANs
https://arxiv.org/pdf/1704.00028.pdf
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
https://arxiv.org/pdf/1704.04086.pdf

Conditional Image Generation with PixelCNN Decoders：
这里写图片描述

Learning to Generate Chairs with Convolutional Neural Networks：
这里写图片描述

DRAW: A Recurrent Neural Network For Image Generation：
这里写图片描述

Generative Adversarial Networks：
这里写图片描述

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks：
这里写图片描述

A note on the evaluation of generative models：
这里写图片描述

Variationally Auto-Encoded Deep Gaussian Processes：
这里写图片描述

Generating Images from Captions with Attention：
这里写图片描述

Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks：
这里写图片描述

Censoring Representations with an Adversary：
这里写图片描述

Distributional Smoothing with Virtual Adversarial Training：
这里写图片描述

Generative Visual Manipulation on the Natural Image Manifold：
这里写图片描述

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks：
这里写图片描述

Wasserstein GAN：
这里写图片描述

Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities：
这里写图片描述

Conditional Generative Adversarial Nets：
这里写图片描述

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets：
这里写图片描述

Conditional Image Synthesis With Auxiliary Classifier GANs：
这里写图片描述

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient：
这里写图片描述

Improved Training of Wasserstein GANs：
这里写图片描述

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis：
这里写图片描述

视觉关注性和显著性

经典模型：

Predicting Eye Fixations using Convolutional Neural Networks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Predicting_Eye_Fixations_2015_CVPR_paper.pdf
Learning a Sequential Search for Landmarks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Singh_Learning_a_Sequential_2015_CVPR_paper.pdf
Multiple Object Recognition with Visual Attention
https://arxiv.org/pdf/1412.7755.pdf
Recurrent Models of Visual Attention
http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf
Capacity Visual Attention Networks
http://easychair.org/publications/download/Capacity_Visual_Attention_Networks
Fully Convolutional Attention Networks for Fine-Grained Recognition
https://arxiv.org/pdf/1603.06765.pdf

Predicting Eye Fixations using Convolutional Neural Networks：
这里写图片描述

Learning a Sequential Search for Landmarks：
这里写图片描述

Multiple Object Recognition with Visual Attention：
这里写图片描述

Recurrent Models of Visual Attention：
这里写图片描述

Capacity Visual Attention Networks：
这里写图片描述

Fully Convolutional Attention Networks for Fine-Grained Recognition：
这里写图片描述

特征检测与匹配（块）

经典模型：

TILDE: A Temporally Invariant Learned DEtector
https://arxiv.org/pdf/1411.4568.pdf
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
https://pdfs.semanticscholar.org/81b9/24da33b9500a2477532fd53f01df00113972.pdf
Discriminative Learning of Deep Convolutional Feature Point Descriptors
http://cvlabwww.epfl.ch/~trulls/pdf/iccv-2015-deepdesc.pdf
Learning to Assign Orientations to Feature Points
https://arxiv.org/pdf/1511.04273.pdf
PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors
https://arxiv.org/pdf/1601.05030.pdf
Multi-scale Pyramid Pooling for Deep Convolutional Representation
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7301274
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
https://arxiv.org/pdf/1406.4729.pdf
Learning to Compare Image Patches via Convolutional Neural Networks
https://arxiv.org/pdf/1504.03641.pdf
PixelNet: Representation of the pixels, by the pixels, and for the pixels
http://www.cs.cmu.edu/~aayushb/pixelNet/pixelnet.pdf
LIFT: Learned Invariant Feature Transform
https://arxiv.org/pdf/1603.09114.pdf

TILDE: A Temporally Invariant Learned DEtector：
这里写图片描述

MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching：
这里写图片描述

Discriminative Learning of Deep Convolutional Feature Point Descriptors：
这里写图片描述

Learning to Assign Orientations to Feature Points：
这里写图片描述

PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors：
这里写图片描述

Multi-scale Pyramid Pooling for Deep Convolutional Representation：
这里写图片描述

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition：
这里写图片描述

Learning to Compare Image Patches via Convolutional Neural Networks：
这里写图片描述

PixelNet: Representation of the pixels, by the pixels, and for the pixels：
这里写图片描述

LIFT: Learned Invariant Feature Transform：
这里写图片描述

人脸识别

经典模型：

Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks
http://vis-www.cs.umass.edu/papers/HuangCVPR12.pdf
Deep Convolutional Network Cascade for Facial Point Detection
http://mmlab.ie.cuhk.edu.hk/archive/CNN/data/CNN_FacePoint.pdf
Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification
acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28”>http://delivery.acm.org/10.1145/2400000/2396303/p749-cai.pdf?ip=123.138.79.12&id=2396303&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf
Deep learning face representation by joint identification-verification
https://arxiv.org/pdf/1406.4773.pdf
Deep learning face representation from predicting 10,000 classes
http://mmlab.ie.cuhk.edu.hk/pdf/YiSun_CVPR14.pdf
Deeply learned face representations are sparse, selective, and robust
https://arxiv.org/pdf/1412.1265.pdf
Deepid3: Face recognition with very deep neural networks
https://arxiv.org/pdf/1502.00873.pdf
FaceNet: A Unified Embedding for Face Recognition and Clustering
https://arxiv.org/pdf/1503.03832.pdf
Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness
https://arxiv.org/pdf/1609.07304.pdf
Large-pose Face Alignment via CNN-based Dense 3D Model Fitting
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Jourabloo_Large-Pose_Face_Alignment_CVPR_2016_paper.pdf
Unconstrained 3D face reconstruction
http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR2015.pdf
Adaptive contour fitting for pose-invariant 3D face shape reconstruction
http://akme-a2.iosb.fraunhofer.de/ETGS15p/2015_Adaptive%20contour%20fitting%20for%20pose-invariant%203D%20face%20shape%20reconstruction.pdf
High-fidelity pose and expression normalization for face recognition in the wild
http://www.cbsr.ia.ac.cn/users/xiangyuzhu/papers/CVPR2015_High-Fidelity.pdf
Adaptive 3D face reconstruction from unconstrained photo collections
http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR16.pdf
Dense 3D face alignment from 2d videos in real-time
http://ieeexplore.ieee.org/stamp/stamp.jsp arnumber=7163142
Robust facial landmark detection under significant head poses and occlusion
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Wu_Robust_Facial_Landmark_ICCV_2015_paper.pdf
A convolutional neural network cascade for face detection
http://users.eecs.northwestern.edu/~xsh835/assets/cvpr2015_cascnn.pdf
Deep Face Recognition Using Deep Convolutional Neural
Network
http://aiehive.com/deep-face-recognition-using-deep-convolution-neural-network/
Multi-view Face Detection Using Deep Convolutional Neural Networks
acm=1492157015_8ffa84e6632810ea05ff005794fed8d5”>http://delivery.acm.org/10.1145/2750000/2749408/p643-farfade.pdf?ip=123.138.79.12&id=2749408&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492157015_8ffa84e6632810ea05ff005794fed8d5
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
https://arxiv.org/pdf/1603.01249.pdf
Wider face: A face detectionbenchmark
http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/paper.pdf
Joint training of cascaded cnn for face detection
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Qin_Joint_Training_of_CVPR_2016_paper.pdf
Face detection with end-to-end integration of a convnet and a 3d model
https://arxiv.org/pdf/1606.00850.pdf
Face Detection using Deep Learning: An Improved Faster RCNN Approach
https://arxiv.org/pdf/1701.08289.pdf

新旧方法对比：
这里写图片描述

Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks：
这里写图片描述

Deep Convolutional Network Cascade for Facial Point Detection：
这里写图片描述

Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification：
这里写图片描述

DeepFace: Closing the Gap to Human-Level Performance in Face Verification：
这里写图片描述

Deep learning face representation by joint identification-verification：
这里写图片描述

Deep learning face representation from predicting 10,000 classes：
这里写图片描述

Deeply learned face representations are sparse, selective, and robust：
这里写图片描述

Deepid3: Face recognition with very deep neural networks：
这里写图片描述

FaceNet: A Unified Embedding for Face Recognition and Clustering：
这里写图片描述

Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness：
这里写图片描述

Large-pose Face Alignment via CNN-based Dense 3D Model Fitting：
这里写图片描述

Unconstrained 3D face reconstruction：
这里写图片描述

Adaptive contour fitting for pose-invariant 3D face shape reconstruction：
这里写图片描述

High-fidelity pose and expression normalization for face recognition in the wild：
这里写图片描述

Adaptive 3D face reconstruction from unconstrained photo collections：
这里写图片描述

Regressing a 3D face shape from a single image：
这里写图片描述

Dense 3D face alignment from 2d videos in real-time：
这里写图片描述

Robust facial landmark detection under significant head poses and occlusion：
这里写图片描述

A convolutional neural network cascade for face detection：
这里写图片描述

Deep Face Recognition Using Deep Convolutional Neural
Network：
这里写图片描述

Multi-view Face Detection Using Deep Convolutional Neural Networks：
这里写图片描述

HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender
Recognition：
这里写图片描述

Wider face: A face detectionbenchmark
这里写图片描述

Joint training of cascaded cnn for face detection：：
这里写图片描述

Face detection with end-to-end integration of a convnet and a 3d model：
这里写图片描述

Face Detection using Deep Learning: An Improved Faster RCNN Approach：
这里写图片描述

3D重建

经典模型：

3D ShapeNets: A Deep Representation for Volumetric Shapes
https://people.csail.mit.edu/khosla/papers/cvpr2015_wu.pdf
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
https://arxiv.org/pdf/1604.00449.pdf
Learning to generate chairs with convolutional neural networks
https://arxiv.org/pdf/1411.5928.pdf
Category-specific object reconstruction from a single image
http://people.eecs.berkeley.edu/~akar/categoryshapes.pdf
Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7298866
ShapeNet: An Information-Rich 3D Model Repository
https://arxiv.org/pdf/1512.03012.pdf
3D reconstruction of synapses with deep learning based on EM Images
http://ieeexplore.ieee.org/stamp/stamp.jsp？arnumber=7558866
Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces
https://arxiv.org/pdf/1605.06240.pdf
Unsupervised Learning of 3D Structure from Images
https://arxiv.org/pdf/1607.00662.pdf
Deep learning 3d shape surfaces using geometry images
http://download.springer.com/static/pdf/605/chp%253A10.1007%252F978-3-319-46466-4_14.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F978-3-319-46466-4_14&token2=exp=1492181498~acl=%2Fstatic%2Fpdf%2F605%2Fchp%25253A10.1007%25252F978-3-319-46466-4_14.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fchapter%252F10.1007%252F978-3-319-46466-4_14*~hmac=b772943d8cd5f914e7bc84a30ddfdf0ef87991bee1d52717cb4930e3eccb0e63
FPNN: Field Probing Neural Networks for 3D Data
https://arxiv.org/pdf/1605.06240.pdf
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
https://arxiv.org/pdf/1505.05641.pdf
Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling
https://arxiv.org/pdf/1610.07584.pdf
SurfNet: Generating 3D shape surfaces using deep residual networks
https://arxiv.org/pdf/1703.04079.pdf

3D ShapeNets: A Deep Representation for Volumetric Shapes：
这里写图片描述

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction：
这里写图片描述

Learning to generate chairs with convolutional neural networks：
这里写图片描述

Category-specific object reconstruction from a single image：
这里写图片描述

Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation：
这里写图片描述

Completing 3d object shape from one depth image：
这里写图片描述

ShapeNet: An Information-Rich 3D Model Repository：
这里写图片描述

3D reconstruction of synapses with deep learning based on EM Images：
这里写图片描述

Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces：

FPNN: Field Probing Neural Networks for 3D Data：
这里写图片描述

Unsupervised Learning of 3D Structure from Images：
这里写图片描述

Deep learning 3d shape surfaces using geometry images：
这里写图片描述

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views：
这里写图片描述

Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling：
这里写图片描述

SurfNet: Generating 3D shape surfaces using deep residual networks：
这里写图片描述

推荐系统

经典模型：

Autorec: Autoencoders meet collaborative filtering
http://users.cecs.anu.edu.au/~akmenon/papers/autorec/autorec-paper.pdf
User modeling with neural network for review rating prediction
https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj35dyVo6nTAhWEnpQKHSAwCw4QFggjMAA&url=http%3a%2f%2fwww%2eaaai%2eorg%2focs%2findex%2ephp%2fIJCAI%2fIJCAI15%2fpaper%2fdownload%2f11051%2f10849&usg=AFQjCNHeMJX8AZzoRF0ODcZE_mXazEktUQ
Collaborative Deep Learning for Recommender Systems
https://arxiv.org/pdf/1409.2944.pdf
A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf
A neural probabilistic model for context based citation recommendation
http://www.personal.psu.edu/wzh112/publications/aaai_slides.pdf
Hybrid Recommender System based on Autoencoders
acm=1492356698_958d1b64105cd41b9719c8d285736396”>http://delivery.acm.org/10.1145/2990000/2988456/p11-strub.pdf?ip=123.138.79.12&id=2988456&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=751612499&CFTOKEN=37099060&acm=1492356698_958d1b64105cd41b9719c8d285736396
Wide & Deep Learning for Recommender Systems
https://arxiv.org/pdf/1606.07792.pdf
Deep Neural Networks for YouTube Recommendations
https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf
Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks
http://www.wanghao.in/paper/NIPS16_CRAE.pdf
Neural Collaborative Filtering
http://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf
Recurrent Recommender Networks
http://alexbeutel.com/papers/rrn_wsdm2017.pdf

Autorec: Autoencoders meet collaborative filtering：
这里写图片描述

User modeling with neural network for review rating prediction：
这里写图片描述

A neural probabilistic model for context based citation recommendation：
这里写图片描述

A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems：
这里写图片描述

Collaborative Deep Learning for Recommender Systems：
这里写图片描述

Wide & Deep Learning for Recommender Systems：
这里写图片描述

Deep Neural Networks for YouTube Recommendations：
这里写图片描述

Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks：
这里写图片描述

Neural Collaborative Filtering：
这里写图片描述

Recurrent Recommender Networks：
这里写图片描述

细粒度图像分析

经典模型：

Part-based R-CNNs for Fine-grained Category Detection
https://people.eecs.berkeley.edu/~nzhang/papers/eccv14_part.pdf
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
http://www.bmva.org/bmvc/2014/files/paper071.pdf
Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
https://arxiv.org/pdf/1605.06878.pdf
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Xiao_The_Application_of_2015_CVPR_paper.pdf
Bilinear CNN Models for Fine-grained Visual Recognition
http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval
https://arxiv.org/pdf/1604.04994.pdf
Near Duplicate Image Detection: min-Hash and tf-idf Weighting
https://www.robots.ox.ac.uk/~vgg/publications/papers/chum08a.pdf
Fine-grained image search
https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf
Efficient large-scale structured learning
http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Branson_Efficient_Large-Scale_Structured_2013_CVPR_paper.pdf
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
https://arxiv.org/pdf/1504.08289.pdf

Part-based R-CNNs for Fine-grained Category Detection：
这里写图片描述

Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
这里写图片描述

Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
这里写图片描述

The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification：
这里写图片描述

Bilinear CNN Models for Fine-grained Visual Recognition：
这里写图片描述

Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval：
这里写图片描述

Near Duplicate Image Detection: min-Hash and tf-idf Weighting：
这里写图片描述

Fine-grained image search：
这里写图片描述

Efficient large-scale structured learning：
这里写图片描述

Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks：
这里写图片描述

图像压缩

经典模型：

Auto-Encoding Variational Bayes
https://arxiv.org/pdf/1312.6114.pdf
k-Sparse Autoencoders
https://arxiv.org/pdf/1312.5663.pdf
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
http://www.iro.umontreal.ca/~lisa/pointeurs/ICML2011_explicit_invariance.pdf
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf
Tutorial on Variational Autoencoders
https://arxiv.org/pdf/1606.05908.pdf
End-to-end Optimized Image Compression
https://openreview.net/pdf?id=rJxdQ3jeg
Guetzli: Perceptually Guided JPEG Encoder
https://arxiv.org/pdf/1703.04421.pdf

Auto-Encoding Variational Bayes：
这里写图片描述

k-Sparse Autoencoders：
这里写图片描述

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction：
这里写图片描述

Stacked Denoising Autoencoders: Learning Useful Representa-tions in a Deep Network with a Local Denoising Criterion：
这里写图片描述

Tutorial on Variational Autoencoders：
这里写图片描述

End-to-end Optimized Image Compression：
这里写图片描述

Guetzli: Perceptually Guided JPEG Encoder：
这里写图片描述

引用块内容
NLP领域
教程：http://cs224d.stanford.edu/syllabus.html
注：
1）目前接触了该领域的一点皮毛，后续会慢慢更新。
2）也希望研究该领域的朋友们做出一些贡献，期待你们的加入。

语音识别领域
注：
1）目前还没有详细了解语音识别领域，后续会慢添加更新。
2）也希望研究该领域的朋友们做出一些贡献，期待你们的加入。

AGI – 通用人工智能领域
注：
1）目前还没有详细了解语音识别领域，后续会慢添加。
2）也希望研究该领域的朋友们做出一些贡献，期待你们的加入。

深度学习引起的一些新的技术：

迁移学习：近些年来在人工智能领域提出的处理不同场景下识别问题的主流方法。相比于浅时代的简单方法，深度神经网络模型具备更加优秀的迁移学习能力。并有一套简单有效的迁移方法，概括来说就是在复杂任务上进行基础模型的预训练（pre-train），在特定任务上对模型进行精细化调整（fine-tune）
联合学习（JL）：
强化学习（RL）：强化学习(reinforcement learning，又称再励学习，评价学习)是一种重要的机器学习方法，在智能控制机器人及分析预测等领域有许多应用。但在传统的机器学习分类中没有提到过强化学习，而在连接主义学习中，把学习算法分为三种类型，即非监督学习(unsupervised learning)、监督学习(supervised leaning)和强化学习。
视频教程：
https://cn.udacity.com/course/reinforcement-learning–ud600

注：由于还没有学习到该部分，仅仅知道这个新的概念，后面会慢慢添加进来。

深度强化学习（DRL）：
Tutorial：http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
课程： http://rll.berkeley.edu/deeprlcourse/
DeepMind：
https://deepmind.com/blog/deep-reinforcement-learning/

终结语

注：
1. 好了，终于差不多啦，为了写这个东西，花费了很多时间，但是通过这个总结以后，我也学到了很多，我真正的认识到DeepLearning已经贯穿了整个CV领域。如果你从事CV领域的话，我建议你花一些时间去了解深度学习吧！毕竟，它正在颠覆这个邻域！
2. 由于经验有限，可能会有一些错误，希望大家多多包涵。如果你有任何问题，可以你消息给我，我会及时的回复大家。
3. 由于本博客是我自己原创，如需转载，请联系我。
邮箱：1575262785@qq.com

技术挖掘者

关注

30
点赞
踩
77

收藏

觉得还不错? 一键收藏
0
评论
深度学习在CV领域的进展以及一些由深度学习演变的新技术

CV领域 1.进展：如上图所述，当前CV领域主要包括两个大的方向，”低层次的感知” 和 “高层次的认知”。 2.主要的应用领域：视频监控、人脸识别、医学图像分析、自动驾驶、机器人、AR、VR 3.主要的技术：分类、目标检测（识别)、分割、目标追踪、边缘检测、姿势评估、理解CNN、超分辨率重建、序列学习、特征检测与匹配、图像标定，视频标定、问答系统、图片生成（文本生成图像）、视
复制链接

扫一扫