计算机视觉论文-2021-06-28

最新推荐文章于 2022-12-28 20:09:37 发布

SophiaCV

最新推荐文章于 2022-12-28 20:09:37 发布

阅读量856

点赞数

分类专栏： CVPaper 文章标签：人工智能计算机视觉

在公众号【计算机视觉联盟】后台回复【9076】获取独家200页AI笔记！

本文链接：https://blog.csdn.net/Sophia_11/article/details/118308211

版权

CVPaper 专栏收录该内容

78 篇文章 72 订阅

订阅专栏

本专栏是计算机视觉方向论文收集积累，时间：2021年6月28日，来源：paper digest

欢迎关注原创公众号 【计算机视觉联盟】，回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记！

直达笔记地址：机器学习手推笔记（GitHub地址）

1, TITLE: CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
AUTHORS: DANIEL MCDUFF et. al.
CATEGORY: cs.AI [cs.AI, cs.CV, cs.LG]
HIGHLIGHT: To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context.

2, TITLE: A Picture May Be Worth A Hundred Words for Visual Question Answering
AUTHORS: YUSUKE HIROTA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose to take description-question pairs as input, instead of deep visual features, and fed them into a language-only Transformer model, simplifying the process and the computational cost.

3, TITLE: SRPN: Similarity-based Region Proposal Networks for Nuclei and Cells Detection in Histology Images
AUTHORS: Yibao Sun ; Xingru Huang ; Huiyu Zhou ; Qianni Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Considering this, we propose similarity based region proposal networks (SRPN) for nuclei and cells detection in histology images.

4, TITLE: PVTv2: Improved Baselines with Pyramid Vision Transformer
AUTHORS: WENHAI WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we improve the original Pyramid Vision Transformer (PVTv1) by adding three improvement designs, which include (1) locally continuous features with convolutions, (2) position encodings with zero paddings, and (3) linear complexity attention layers with average pooling.

5, TITLE: Graph Pattern Loss Based Diversified Attention Network for Cross-Modal Retrieval
AUTHORS: Xueying Chen ; Rong Zhang ; Yibing Zhan
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations.

6, TITLE: RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection
AUTHORS: PEI SUN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Towards this goal, we propose Range Sparse Net (RSN), a simple, efficient, and accurate 3D object detector in order to tackle real time 3D object detection in this extended detection regime.

7, TITLE: To The Point: Efficient 3D Object Detection in The Range Image with Graph Convolution Kernels
AUTHORS: YUNING CHAI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view.

8, TITLE: Single Image Texture Translation for Data Augmentation
AUTHORS: Boyi Li ; Yin Cui ; Tsung-Yi Lin ; Serge Belongie
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation.

9, TITLE: Efficient Document Image Classification Using Region-Based Graph Neural Network
AUTHORS: Jaya Krishna Mandivarapu ; Eric Bunch ; Qian You ; Glenn Fung
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document.

10, TITLE: Energy-Based Generative Cooperative Saliency Prediction
AUTHORS: Jing Zhang ; Jianwen Xie ; Zilong Zheng ; Nick Barnes
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the prediction as a sampling process.

11, TITLE: Bayesian Eye Tracking
AUTHORS: Qiang Ji ; Kang Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose a Bayesian framework for model-based eye tracking.

12, TITLE: On The Robustness of Pretraining and Self-Supervision for A Deep Learning-based Analysis of Diabetic Retinopathy
AUTHORS: VIGNESH SRINIVASAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we aim to assess the broader implications of these approaches.

13, TITLE: Countering Adversarial Examples: Combining Input Transformation and Noisy Training
AUTHORS: Cheng Zhang ; Pan Gao
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: Specifically, based on an analysis of the frequency coefficient, we design a NN-favored quantization table for compression.

14, TITLE: Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation
AUTHORS: WANQING XIE et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.MM]
HIGHLIGHT: In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording.

15, TITLE: HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based Gesture Recognition
AUTHORS: Jianbo Liu ; Ying Wang ; Shiming Xiang ; Chunhong Pan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, the self-attention mechanism is introduced to alleviate this problem.

16, TITLE: Semi-supervised Meta-learning with Disentanglement for Domain-generalised Medical Image Segmentation
AUTHORS: Xiao Liu ; Spyridon Thermos ; Alison O'Neil ; Sotirios A. Tsaftaris
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this problem, we propose a novel semi-supervised meta-learning framework with disentanglement.

17, TITLE: Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair
AUTHORS: Sho Maeoki ; Yusuke Mukuta ; Tatsuya Harada
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper we undertake the task of text-based video moment retrieval from a corpus of videos.

18, TITLE: DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval
AUTHORS: Giorgos Kordopatis-Zilos ; Christos Tzelepis ; Symeon Papadopoulos ; Ioannis Kompatsiaris ; Ioannis Patras
CATEGORY: cs.CV [cs.CV, cs.IR, cs.MM]
HIGHLIGHT: In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets.

19, TITLE: Generalized One-Class Learning Using Pairs of Complementary Classifiers
AUTHORS: Anoop Cherian ; Jue Wang
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we explore novel objectives for one-class learning, which we collectively refer to as Generalized One-class Discriminative Subspaces (GODS).

20, TITLE: Image-to-image Transformation with Auxiliary Condition
AUTHORS: Robert Leer ; Hessi Roma ; James Amelia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To overcome this problem, we propose to introduce the label information of subjects, e.g., pose and type of objects in the training of CycleGAN, and lead it to obtain label-wise transforamtion models.

21, TITLE: Projection-wise Disentangling for Fair and Interpretable Representation Learning: Application to 3D Facial Shape Analysis
AUTHORS: XIANJING LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, we propose to mitigate the bias while keeping almost all information in the latent representations, which enables us to observe and interpret them as well.

22, TITLE: Partially Fake It Till You Make It: Mixing Real and Fake Thermal Images for Improved Object Detection
AUTHORS: Francesco Bongini ; Lorenzo Berlincioni ; Marco Bertini ; Alberto Del Bimbo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes.

23, TITLE: Vision Transformer Architecture Search
AUTHORS: XIU SU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we make a further step by examining the intrinsic structure of transformers for vision tasks and propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets.

24, TITLE: Interactive Multi-level Stroke Control for Neural Style Transfer
AUTHORS: Max Reimann ; Benito Buchheim ; Amir Semmo ; J�rgen D�llner ; Matthias Trapp
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: For additional level-of-control, we propose a network agnostic method for stroke-orientation adjustment by utilizing the rotation-variance of CNNs.

25, TITLE: Physics Perception in Sloshing Scenes with Guaranteed Thermodynamic Consistency
AUTHORS: Beatriz Moya ; Alberto Badias ; David Gonzalez ; Francisco Chinesta ; Elias Cueto
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose a strategy to learn the full state of sloshing liquids from measurements of the free surface.

26, TITLE: "Zero Shot" Point Cloud Upsampling
AUTHORS: Kaiyue Zhou ; Ming Dong ; Suzan Arslanturk
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present an unsupervised approach to upsample point clouds internally referred as "Zero Shot" Point Cloud Upsampling (ZSPU) at holistic level.

27, TITLE: Shape Registration in The Time of Transformers
AUTHORS: Giovanni Trappolini ; Luca Cosmo ; Luca Moschella ; Riccardo Marin ; Emanuele Rodol�
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds.

28, TITLE: NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation
AUTHORS: Xiaohui Zeng ; Raquel Urtasun ; Richard Zemel ; Sanja Fidler ; Renjie Liao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.

29, TITLE: Generative Modeling for Multi-task Visual Learning
AUTHORS: Zhipeng Bao ; Martial Hebert ; Yu-Xiong Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks.

30, TITLE: Connecting Sphere Manifolds Hierarchically for Regularization
AUTHORS: Damien Scieur ; Youngsung Kim
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper considers classification problems with hierarchically organized classes.

31, TITLE: Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
AUTHORS: Long Hoang Dang ; Thao Minh Le ; Vuong Le ; Truyen Tran
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects.

32, TITLE: Diversifying Semantic Image Synthesis and Editing Via Class- and Layer-wise VAEs
AUTHORS: Yuki Endo ; Yoshihiro Kanamori
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces.

33, TITLE: Building Intelligent Autonomous Navigation Agents
AUTHORS: Devendra Singh Chaplot
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making.

34, TITLE: Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
AUTHORS: HONGWEI XUE et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

35, TITLE: Animatable Neural Radiance Fields from Monocular RGB Video
AUTHORS: JIANCHUAN CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present animatable neural radiance fields for detailed human avatar creation from monocular videos.

36, TITLE: Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
AUTHORS: Julien Philip ; S�bastien Morgenthaler ; Micha�l Gharbi ; George Drettakis
CATEGORY: cs.GR [cs.GR, cs.CV]
HIGHLIGHT: We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation.

37, TITLE: Domain-guided Machine Learning for Remotely Sensed In-Season Crop Growth Estimation
AUTHORS: George Worrall ; Anand Rangarajan ; Jasmeet Judge
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this study, we demonstrate the use of agronomic knowledge of crop growth drivers in a Long Short-Term Memory-based, Domain-guided neural network (DgNN) for in-season crop progress estimation.

38, TITLE: Re-parameterizing VAEs for Stability
AUTHORS: David Dehaene ; R�my Brossard
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE).

39, TITLE: Multiview Video Compression Using Advanced HEVC Screen Content Coding
AUTHORS: Jaros?aw Samelak ; Marek Doma?ski
CATEGORY: cs.MM [cs.MM, cs.CV, I.4.2]
HIGHLIGHT: The paper presents a new approach to multiview video coding using Screen Content Coding.

40, TITLE: Semantic Annotation for Computational Pathology: Multidisciplinary Experience and Best Practice Recommendations
AUTHORS: NOORUL WAHAB et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we address this shortcoming by presenting the experience and best practices acquired during the execution of a large-scale annotation exercise involving a multidisciplinary team of pathologists, ML experts and researchers as part of the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) consortium.

41, TITLE: FOVQA: Blind Foveated Video Quality Assessment
AUTHORS: Yize Jin ; Anjul Patney ; Richard Webb ; Alan Bovik
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: Towards advancing the development of foveated compression / streaming algorithms, we have devised a no-reference (NR) foveated video quality assessment model, called FOVQA, which is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS).

42, TITLE: Circumpapillary OCT-Focused Hybrid Learning for Glaucoma Grading Using Tailored Prototypical Neural Networks
AUTHORS: GABRIEL GARC�A et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans.

43, TITLE: A Novel Self-Learning Framework for Bladder Cancer Grading Using Histopathological Images
AUTHORS: Gabriel Garc�a ; Anna Esteve ; Adri�n Colomer ; David Ramos ; Valery Naranjo
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs.

44, TITLE: Generalized Unsupervised Clustering of Hyperspectral Images of Geological Targets in The Near Infrared
AUTHORS: ANGELA F. GAO et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: Here we develop a fully unsupervised workflow for feature extraction and clustering informed by both expert spectral geologist input and quantitative metrics.