计算机视觉论文-2021-09-08_计算机视觉论文csdn-CSDN博客

在公众号【计算机视觉联盟】后台回复【9076】获取独家200页AI笔记！

本文链接：https://blog.csdn.net/Sophia_11/article/details/120226871

1, TITLE: Automatic Landmarks Correspondence Detection in Medical Images with An Application to Deformable Image Registration
AUTHORS: Monika Grewal ; Jan Wiersma ; Henrike Westerveld ; Peter A. N. Bosman ; Tanja Alderliesten
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a Deep Convolutional Neural Network (DCNN), called DCNN-Match, that learns to predict landmark correspondences in 3D images in a self-supervised manner.

2, TITLE: CovarianceNet: Conditional Generative Model for Correct Covariance Prediction in Human Motion Prediction
AUTHORS: Aleksey Postnikov ; Aleksander Gamayunov ; Gonzalo Ferrer
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We present a new method to correctly predict the uncertainty associated with the predicted distribution of future trajectories.

3, TITLE: Vision Transformers For Weeds and Crops Classification Of High Resolution UAV Images
AUTHORS: Reenul Reedha ; Eric Dericquebourg ; Raphael Canals ; Adel Hafiane
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we adopt the self-attention mechanism via the ViT models for plant classification of weeds and crops: red beet, off-type beet (green leaves), parsley and spinach.

4, TITLE: Graph Attention Layer Evolves Semantic Segmentation for Road Pothole Detection: A Benchmark and Algorithms
AUTHORS: Rui Fan ; Hengli Wang ; Yuan Wang ; Ming Liu ; Ioannis Pitas
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: The former approaches typically employ 2-D image analysis/understanding or 3-D point cloud modeling and segmentation algorithms to detect road potholes from vision sensor data.

5, TITLE: NnFormer: Interleaved Transformer for Volumetric Segmentation
AUTHORS: HONG-YU ZHOU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.

6, TITLE: Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
AUTHORS: Katsuyuki Nakamura ; Hiroki Ohashi ; Mitsuhiro Okada
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a method for effectively utilizing the sensor data in combination with the video data on the basis of an attention mechanism that dynamically determines the modality that requires more attention, taking the contextual information into account.

7, TITLE: Few-shot Learning Via Dependency Maximization and Instance Discriminant Analysis
AUTHORS: Zejiang Hou ; Sun-Yuan Kung
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In contrast, we propose a simple approach to exploit unlabeled data accompanying the few-shot task for improving few-shot performance.

8, TITLE: STRIVE: Scene Text Replacement In Videos
AUTHORS: VIJAY KUMAR B G et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by video, namely effects induced by changing lighting, motion blur, diverse variations in camera-object pose over time,and preservation of temporal consistency. We introduce new synthetic and real-world datasets with paired text objects.

9, TITLE: Robustness and Generalization Via Generative Adversarial Training
AUTHORS: Omid Poursaeed ; Tianxing Jiang ; Harry Yang ; Serge Belongie ; SerNam Lim
CATEGORY: cs.CV [cs.CV, cs.CR, cs.LG]
HIGHLIGHT: In this paper we present Generative Adversarial Training, an approach to simultaneously improve the model's generalization to the test set and out-of-domain samples as well as its robustness to unseen adversarial attacks.

10, TITLE: Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation
AUTHORS: Guan-Nan Dong ; Chi-Man Pun ; Zheng Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties in an adaptive manner to strengthen the facial details for effective unsupervised kinship verification.

11, TITLE: Kinship Verification Based on Cross-Generation Feature Interaction Learning
AUTHORS: Guan-Nan Dong ; Chi-Man Pun ; Zheng Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.

12, TITLE: DeepFakes: Detecting Forged and Synthetic Media Content Using Machine Learning
AUTHORS: SM ZOBAED et. al.
CATEGORY: cs.CV [cs.CV, cs.IR]
HIGHLIGHT: This study presents challenges, research trends, and directions related to DeepFake creation and detection techniques by reviewing the notable research in the DeepFake domain to facilitate the development of more robust approaches that could deal with the more advance DeepFake in the future.

13, TITLE: Evaluation of An Audio-Video Multimodal Deepfake Dataset Using Unimodal and Multimodal Detectors
AUTHORS: Hasam Khalid ; Minha Kim ; Shahroz Tariq ; Simon S. Woo
CATEGORY: cs.CV [cs.CV, cs.MM, cs.SD, eess.AS, eess.IV, I.4.9; I.5.4]
HIGHLIGHT: With the emerging threat of potential harm deepfakes can cause, researchers have proposed deepfake detection methods.

14, TITLE: Journalistic Guidelines Aware News Image Captioning
AUTHORS: Xuewen Yang ; Svebor Karaman ; Joel Tetreault ; Alex Jaimes
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a new approach to this task, motivated by caption guidelines that journalists follow.

15, TITLE: Grassmannian Graph-attentional Landmark Selection for Domain Adaptation
AUTHORS: Bin Sun ; Shaofan Wang ; Dehui Kong ; Jinghua Li ; Baocai Yin
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To better combine the advantages of the two schemes, we propose a Grassmannian graph-attentional landmark selection (GGLS) framework for domain adaptation.

16, TITLE: ICCAD Special Session Paper: Quantum-Classical Hybrid Machine Learning for Image Classification
AUTHORS: Mahabubul Alam ; Satwik Kundu ; Rasit Onur Topaloglu ; Swaroop Ghosh
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In any typical DL-based image classification, we use convolutional neural network (CNN) to extract features from the image and multi-layer perceptron network (MLP) to create the actual decision boundaries.

17, TITLE: Zero-Shot Open Set Detection By Extending CLIP
AUTHORS: Sepideh Esmaeilpour ; Bing Liu ; Eric Robertson ; Lei Shu
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper proposes a novel and yet simple method (called ZO-CLIP) to solve the problem.

18, TITLE: GCsT: Graph Convolutional Skeleton Transformer for Action Recognition
AUTHORS: RUWEN BAI et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this work, we present a novel architecture, named Graph Convolutional skeleton Transformer (GCsT), which addresses limitations in GCNs by introducing Transformer.

19, TITLE: Pano3D: A Holistic Benchmark and A Solid Baseline for $360^o$ Depth Estimation
AUTHORS: GEORGIOS ALBANIS et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We use it as a basis for an extended analysis seeking to offer insights into classical choices for depth estimation.

20, TITLE: Support Vector Machine for Handwritten Character Recognition
AUTHORS: Jomy John
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, a system for recognition of unconstrained handwritten Malayalam characters is proposed.

21, TITLE: Fair Comparison: Quantifying Variance in Resultsfor Fine-grained Visual Categorization
AUTHORS: Matthew Gwilliam ; Adam Teuscher ; Connor Anderson ; Ryan Farrell
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: From this analysis, we both highlight the importance of reporting and comparing methods based on information beyond overall accuracy, as well as point out techniques that mitigate variance in FGVC results.

22, TITLE: WhyAct: Identifying Action Reasons in Lifestyle Vlogs
AUTHORS: Oana Ignat ; Santiago Castro ; Hanwen Miao ; Weiji Li ; Rada Mihalcea
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.

23, TITLE: Single-Camera 3D Head Fitting for Mixed Reality Clinical Applications
AUTHORS: Tejas Mane ; Aylar Bayramova ; Kostas Daniilidis ; Philippos Mordohai ; Elena Bernardis
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Our goal is to reconstruct the head model of each person to enable future mixed reality applications.

24, TITLE: Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
AUTHORS: Chuanguang Yang ; Zhulin An ; Linhang Cai ; Yongjun Xu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To fully take advantage of hierarchical feature maps, we propose to append several auxiliary branches at various hidden layers.

25, TITLE: Improving Transferability of Domain Adaptation Networks Through Domain Alignment Layers
AUTHORS: Lucas Fernando Alvarenga e Silva ; Daniel Carlos Guimar�es Pedronette ; F�bio Augusto Faria ; Jo�o Paulo Papa ; Jurandy Almeida
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we argue that it is not sufficient to handle domain shift only based on domain-level features, but it is also essential to align such information on the feature space.

26, TITLE: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
AUTHORS: YUNING DU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PP-OCRv2.

27, TITLE: CIM: Class-Irrelevant Mapping for Few-Shot Classification
AUTHORS: SHUAI SHAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this challenge, we propose a simple, flexible method, dubbed as Class-Irrelevant Mapping (CIM).

28, TITLE: Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification
AUTHORS: Jianzhe Lin ; Tianze Yu ; Z. Jane Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Our method contributions are 2-fold: An active learning way is proposed to acquire salient labels for multi-label images; and a novel Adaptive Temperature Associated Model (ATAM) specifically using partial annotations is proposed for multi-label image classification.

29, TITLE: Smart Traffic Monitoring System Using Computer Vision and Edge Computing
AUTHORS: GUANXIONG LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we focus on two common traffic monitoring tasks, congestion detection, and speed detection, and propose a two-tier edge computing based model that takes into account of both the limited computing capability in cloudlets and the unstable network condition to the TMC.

30, TITLE: Improving Dietary Assessment Via Integrated Hierarchy Food Classification
AUTHORS: RUNYU MAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we introduce a new food classification framework to improve the quality of predictions by integrating the information from multiple domains while maintaining the classification accuracy.

31, TITLE: Unpaired Adversarial Learning for Single Image Deraining with Rain-Space Contrastive Constraints
AUTHORS: XIANG CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address such limitation, we develop an effective unpaired SID method which explores mutual properties of the unpaired exemplars by a contrastive learning manner in a GAN framework, named as CDR-GAN.

32, TITLE: FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
AUTHORS: RUI LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here we aim to tackle this problem by proposing FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations.

33, TITLE: Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets
AUTHORS: Matthew Gwilliam ; Srinidhi Hegde ; Lade Tinubu ; Alex Hanson
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Many existing works have made great strides towards reducing racial bias in face recognition.

34, TITLE: Rendezvous: Attention Mechanisms for The Recognition of Surgical Action Triplets in Endoscopic Videos
AUTHORS: CHINEDU INNOCENT NWOYE et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To achieve this task, we introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels.

35, TITLE: Self-supervised Tumor Segmentation Through Layer Decomposition
AUTHORS: Xiaoman Zhang ; Weidi Xie ; Chaoqin Huang ; Ya Zhang ; Yanfeng Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a self-supervised approach for tumor segmentation.

36, TITLE: Learning to Combine The Modalities of Language and Video for Temporal Moment Localization
AUTHORS: Jungkyoo Shin ; Jinyoung Moon
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address these shortcomings, we introduce a novel recurrent unit, cross-modal long short-term memory (CM-LSTM), by mimicking the human cognitive process of localizing temporal moments that focuses on the part of a video segment related to the part of a query, and accumulates the contextual information across the entire video recurrently.

37, TITLE: Brand Label Albedo Extraction of ECommerce Products Using Generative Adversarial Network
AUTHORS: Suman Sapkota ; Manish Juneja ; Laurynas Keleras ; Pranav Kotwal ; Binod Bhattarai
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper we present our solution to extract albedo of branded labels for e-commerce products. To this end, we generate a large-scale photo-realistic synthetic data set for albedo extraction followed by training a generative model to translate images with diverse lighting conditions to albedo.

38, TITLE: Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand Hygiene
AUTHORS: HUY Q. VO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper contributes a new high-quality dataset for hand gesture recognition in hand hygiene systems, named "MFH".

39, TITLE: Training Deep Networks from Zero to Hero: Avoiding Pitfalls and Going Beyond
AUTHORS: Moacir Antonelli Ponti ; Fernando Pereira dos Santos ; Leo Sampaio Ferraz Ribeiro ; Gabriel Biscaro Cavallari
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This tutorial covers the basic steps as well as more recent options to improve models, in particular, but not restricted to, supervised learning.

40, TITLE: Efficient ADMM-based Algorithms for Convolutional Sparse Coding
AUTHORS: Farshad G. Veshki ; Sergiy A. Vorobyov
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This letter presents a solution to this subproblem, which improves the efficiency of the state-of-the-art algorithms.

41, TITLE: Fishr: Invariant Gradient Variances for Out-of-distribution Generalization
AUTHORS: Alexandre Rame ; Corentin Dancette ; Matthieu Cord
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this paper, we propose a new learning scheme to enforce domain invariance in the space of the gradients of the loss function: specifically, we introduce a regularization term that matches the domain-level variances of gradients across training domains.

42, TITLE: Learning Fast Sample Re-weighting Without Reward Data
AUTHORS: Zizhao Zhang ; Tomas Pfister
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This paper addresses these two problems and presents a novel learning-based fast sample re-weighting (FSR) method that does not require additional reward data.

43, TITLE: Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor
AUTHORS: Tanaka Kanji
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: In this study, we consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner.

44, TITLE: Intelligent Motion Planning for A Cost-effective Object Follower Mobile Robotic System with Obstacle Avoidance
AUTHORS: Sai Nikhil Gona ; Prithvi Raj Bandhakavi
CATEGORY: cs.RO [cs.RO, cs.CV, 68T40, I.2.9]
HIGHLIGHT: So, we propose a robotic system which uses robot vision and deep learning to get the required linear and angular velocities which are {\nu} and {\omega}, respectively.

45, TITLE: Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds
AUTHORS: Dengxin Dai ; Arun Balajee Vasudevan ; Jiri Matas ; Luc Van Gool
CATEGORY: cs.SD [cs.SD, cs.CV, eess.AS]
HIGHLIGHT: To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera.

46, TITLE: Improving Phenotype Prediction Using Long-Range Spatio-Temporal Dynamics of Functional Connectivity
AUTHORS: Simon Dahan ; Logan Z. J. Williams ; Daniel Rueckert ; Emma C. Robinson
CATEGORY: q-bio.NC [q-bio.NC, cs.CV, cs.LG, eess.IV]
HIGHLIGHT: We evaluate this approach using the Human Connectome Project (HCP) dataset on sex classification and fluid intelligence prediction.

47, TITLE: Crash Report Data Analysis for Creating Scenario-Wise, Spatio-Temporal Attention Guidance to Support Computer Vision-based Perception of Fatal Crash Risks
AUTHORS: Yu Li ; Muhammad Monjurul Karim ; Ruwen Qin
CATEGORY: stat.AP [stat.AP, cs.CV]
HIGHLIGHT: Therefore, this paper develops a data analytics model, named scenario-wise, Spatio-temporal attention guidance, from fatal crash report data, which can estimate the relevance of detected objects to fatal crashes from their environment and context information.

48, TITLE: Perceptual Video Compression with Recurrent Conditional GAN
AUTHORS: Ren Yang ; Luc Van Gool ; Radu Timofte
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper proposes a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network.

49, TITLE: FDA: Feature Decomposition and Aggregation for Robust Airway Segmentation
AUTHORS: MINGHUI ZHANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we proposed a new dual-stream network to address the variability between the clean domain and noisy domain, which utilizes the clean CT scans and a small amount of labeled noisy CT scans for airway segmentation.