2018年ACM MM会议论文 arXiv链接

ACM MM 会议是多媒体领域的top1顶会 人人心向往之的会议
我的有位老师说他的学生读了三年博士,投了好几次MM都没被录,主动要求延毕,说三年我追个姑娘也追到手了,竟然投会议就是投不中。。。
今年ACM MM 会议将在10月22-26日在韩国首尔举行,会议相关议程在官网
http://www.acmmm.org/2018/
已经发布,已经接收的papers列表:

Accepted Papers

Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing

Incremental Deep Hidden Attribute Learning

Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector

Visual Domain Adaptation with Manifold Embedded Distribution Alignment

Object-Difference Attention: A simple relational attention for Visual Question Answering

Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes

Multi-Human Parsing Machines

Deep Priority Hashing

CropNet: Real-Time Thumbnailing

Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search

Supervised Online Hashing via Hadamard Codebook Learning

Shared Linear Encoder-based Gaussian Process Latent Variable Model for Visual Classification

Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval

Fine-grained Grocery Product Recognition by One-shot Learning

Fine-grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Style Separation and Synthesis via Generative Adversarial Networks

Attention-based Pyramid Aggregation Network for Visual Place Recognition

Dance with Melody : An LSTM-autoencoder Approach on Music-oriented Dance Synthesis

Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering

Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data

Post Tuned Hashing: A New Approach to Indexing High-dimensional Data

Joint Sign Language Recognition and Education System with ST-Net

Aesthetic-Driven Image Enhancement by Adversarial Learning

Cascaded Feature Augmentation with Diffusion for Image Retrieval

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

Temporal Sequence Distillation: Towards Few Frame Action Recognition

Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval

Multi-View Image Generation from a Single-View

Slackliner — An Interactive Slackline Training Assistant

Hierarchical Memory Modelling for Video Captioning

Group Re-Identification: Leveraging and Integrating Multi-Grain Information

Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions

Multi-modal Preference Modeling for Product Search

GraphNet: Learning Image Pseudo Annotations for Weakly-Supervised Semantic Segmentation

Deep Triplet Quantization

Previewer for Multiple-Scale Object Detector

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming based on Deep Reinforcement Learning

What dress fits me best? Fashion Recommendation on the Clothing Style for Personal Body Shape

SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval

OSMO: Online Specific Models for Occlusion in Multiple Object Tracking under Surveillance Scene

Cross-modal Moment Localization in Videos

Attribute-Aware Attention Model for Fine-grained Representation Learning

Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency

Learning Discriminative Features with Multiple Granularities for Person Re-Identification

StripNet: Towards Topology Consistent Strip Structure Segmentation

Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment

An End-to-End Quadrilateral Regression Network for Comic Panel Extraction

CLS: A Cross-user Learning based System for Improving QoE in 360-degree Video Adaptive Streaming

Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training

Life-long Cross-media Correlation Learning

Text-to-image Synthesis via Symmetrical Distillation Networks

Multi-Scale Correlation for Sequential Cross-modal Hashing Learning

Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking

Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation

Explore Multi-Step Reasoning in Video Question Answering

Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network

Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond

Watch, Think and Attend: End-to-End Video Classification via Dynamic Knowledge Evolution Modeling

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations

Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling

Participation-Contributed Temporal Dynamic Model for Group Activity Recognition

A Unified Generative Adversarial Framework for Image Generation and Person Re-identification

Facial Expression Recognition in the Wild: A Cycle-Consistent Adversarial Attention Transfer Approach

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs

Mining Semantics-Preserving Attention for Group Activity Recognition

Causally Regularized Learning on Data with Agnostic Bias

I read, I saw, I tell: Texts Assisted Fine-Grained Visual Classification

Context-Aware Unsupervised Text Stylization

Bridge The Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Database and A Deep Learning Model

When to Learn What: Deep Cognitive Subspace Clustering

Look Deeper See Richer: Depth-aware Image Paragraph Captioning

Depth Structure Preserving Scene Image Generation

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification

Learning Multimodal Taxonomy via Variational Deep Graph Embedding and Clustering

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning

A Distributed Approach for Bitrate Selection in HTTP Adaptive Streaming

Generative Adversarial Product Quantisation

EmotionGAN: Unsupervised Domain Adaptation for Learning Discrete Probability Distributions of Image Emotions

Few-Shot Adaptation for Video Semantic Indexing

Historical Context-based Style Classification of Painting Images via Label Distribution Learning

Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation

High-Quality Exposure Correction of Underexposed Photos

Fashion Sensitive Clothing Recommendation using Hierarchical Collocation Model

A Margin-based MLE for Crowdsourced Partial Ranking

Personalized Serious Games for Cognitive Intervention with Lifelog Visual Analytics

PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation

iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera

Face-Voice Matching using Cross-modal Embeddings

Multi-Scale Context Attention Network for Image Retrieval

When Deep Fool Meets Deep Prior: Adversarial Attack on Image Super-Resolution

Musicality-Novelty Generative Adversarial Nets for Algorithmic Composition

Knowledge-aware Multimodal Dialogue Systems

Cross-Domain Adversarial Feature Learning for Sketch Re-identification

Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval

Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning

CSAN: Contextual Self-Attention Network for User Sequential Recommendation
Semantic Human Matting

Visual Spatial Attention Network for Relationship Detection

Geometry Guided Adversarial Facial Expression Synthesis

Personalized multiple facial action unit recognition through generative adversarial recognition network

Learning Joint Multimodal Representation with Adversarial Attention Networks

Detecting Abnormality without Knowing Normality: A Two-stage Approach for Unsupervised Video Abnormal Event Detection

WildFish: A Large Benchmark for Fish Recognition in the Wild

Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction

BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network

Songle Sync: A Large-Scale Web-based Platform for Controlling Various Devices in Synchronization with Music

CloudVR: Cloud Accelerated Interactive Mobile Virtual Reality

RGCNN: Regularized Graph CNN for Point Cloud Segmentation

Video-based Person Re-identification via Self-Paced Learning and Deep Reinforcement Learning Framework

Photo Squarization by Deep Multi-Operator Retargeting

Predicting Visual Context for Unsupervised Event Segmentation in Continuous Photo-streams

Semantic Image Inpainting with Progressive Generative Networks

Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval

LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation

Direction-aware Neural Style Transfer

Reconfigurable Inverted Index

Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

A Unified Framework for Multimodal Domain Adaptation

Trusted Guidance Pyramid Network for Human Parsing

USAR: an interactive user-specific aesthetic ranking framework for images

Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Structure Guided Photorealistic Style Transfer

Tracking-assisted Weakly Supervised Online Visual Object Segmentation in Unconstrained Videos

An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks

Decoupled Novel Object Captioner

ThoughtViz: Visualizing Human Thoughts Using Generative Adversarial Network

Optimizing Personalized Interaction Experience in Crowd-Interactive Livecast: A Cloud-Edge Approach

End-to-End Blind Quality Assessment of Compressed Video Using Deep Neural Networks

Dynamic Sound Field Synthesis for Speech and Music Optimization

Local Convolutional Neural Networks for Person Re-Identification

Interpretable Multimodal Retrieval for Fashion Products

Conditional Expression Synthesis with Face Parsing Transformation

A Feature-Adaptive Semi-Supervised Framework for Co-Saliency Detection

Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification

iSPA-Net: Iterative Semantic Pose Alignment Network

Extractive Video Summarizer with Memory Augmented Neural Networks

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations

Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images

From data to knowledge: deep learning model compression, transmission and communication

ChipGAN: A Generative Adversarial Network for Chinese Ink Wash Painting Style Transfer

Dest-ResNet: a Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction

Boosting Scene Parsing Performance via Reliable Scale Prediction

Deep Cross modal learning for Caricature Verification and Identification (CaVINet)

Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

Enhancing Visual Question Answering Using Dropout

Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology

Improving QoE of ABR Streaming Sessions through QUIC Retransmissions

Temporal Cross-Media SubSpaces Learning with Soft-Constraints

Learning Local Descriptors with Adversarial Enhancer from Volumetric Geometry Patches

SibNet: Sibling Convolutional Encoder for Video Captioning

Context-Dependent Diffusion Network for Visual Relationship Detection

Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction

Generating Defensive Plays in Basketball Games

Connectionist temporal fusion for Sign Language Translation

JPEG Decompression in the Homomorphic Encryption Domain

BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs

Support Neighbor Loss for Person Re-Identification

A Large Scale RGB-D Database for Arbitrary-view Human Action Recognition

FlexStream: Towards Flexible Adaptive Video Streaming on End Devices using Extreme SDN

Spotting and Aggregating Salient Regions for Video Captioning

Structural inpainting

Partial Multi-View Subspace Clustering

FoV-Aware Edge Caching for Adaptive 360° Video Streaming

Attentive LSTM Crowd Flow Machines

Perceptual Temporal Incoherence Aware Stereo Video Retargeting

Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels

Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval

Investigation of Small Group Social Interactions using Deep Visual Activity-Based Nonverbal Features

Dissimilarity Representation Learning for Generalized Zero-Shot Recognition

Examine before You Answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA

Cumulative Nets for Edge Detection

Beyond the Product: Discovering Image Posts for Brands in Social Media

Robustness and Discrimination Oriented Hashing Combining Texture and Invariant Vector Distance

SLIONS: A Karaoke Application to Enhance Foreign Language Learning

Drawing in a Virtual 3D Space – Introducing VR Drawing in Elementary School Art Education

Semi-Supervised DFF: Decoupling Detection and Feature Flow for Video Object Detectors

Residual-Guide Feature Fusion Network for Single Image Deraining

Paragraph generation network with visual relationship detection

Hybrid Point Cloud Attribute Compression Using Slice-based Layered Structure and Block-based Intra Prediction

CIRCE: Real-Time Caching for Instance Recognition on Cloud Environments and Multi-Core Architectures

From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition

Unsupervised Learning of 3D Model Reconstruction from Hand-Drawn Sketches

Learning to Synthesize 3D Indoor Scenes from Monocular Images

DASH for 3D Networked Virtual Environment

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition

The Effect of Foveation on High Dynamic Range Video Perception

GestureGAN for Hand Gesture-to-Gesture Translation in the Wild

MiniView Layout for Bandwidth-Efficient 360-Degree Video

An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images

Deep Multimodal Image-Repurposing Detection

Video-to-Video Translation with Global Temporal Consistency

Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals

Cross-Species Learning: A Low-Cost Approach to Learning Human Fight from Animal Fight

PoB: Toward Reasoning Patterns of Beauty in Image Data

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Deep Adaptive Temporal Pooling for Activity Recognition

Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder

Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning

Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation

Person Re-identification with Hierarchical Deep Learning Feature and efficient XQDA Metric

EmoCeleb: Emotion recognition in speech using Cross-Modal Transfer in the wild

随便搜一个‘generative’关键词就有14篇文章,可见gan和vae等仍然是一个大方向。

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值