本专栏是计算机视觉方向论文收集积累,时间:2021年9月9日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Identification of Social-Media Platform of Videos Through The Use of Shared Features
AUTHORS: Luca Maiano ; Irene Amerini ; Lorenzo Ricciardi Celsi ; Aris Anagnostopoulos
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To mitigate this limitation, in this work we propose two different solutions based on transfer learning and multitask learning to determine whether a video has been uploaded from or downloaded to a specific social platform through the use of shared features with images trained on the same task.
2, TITLE: Temporal RoI Align for Video Object Recognition
AUTHORS: TAO GONG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.
3, TITLE: YouRefIt: Embodied Reference Understanding with Language and Gesture
AUTHORS: YIXIN CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle this problem, we introduce YouRefIt, a new crowd-sourced dataset of embodied reference collected in various physical scenes; the dataset contains 4,195 unique reference clips in 432 indoor scenes.
4, TITLE: Shuffled Patch-Wise Supervision for Presentation Attack Detection
AUTHORS: Alperen Kantarc? ; Hasan Dertli ; Haz?m Kemal Ekenel
CATEGORY: cs.CV [cs.CV, cs.CR, cs.LG]
HIGHLIGHT: To this end, we propose a new PAD approach, which combines pixel-wise binary supervision with patch-based CNN.
5, TITLE: GTT-Net: Learned Generalized Trajectory Triangulation
AUTHORS: Xiangyu Xu ; Enrique Dunn
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present GTT-Net, a supervised learning framework for the reconstruction of sparse dynamic 3D geometry.
6, TITLE: FaceCook: Face Generation Based on Linear Scaling Factors
AUTHORS: Tianren Wang ; Can Peng ; Teng Zhang ; Brian Lovell
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, we propose a new approach to mapping the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations.
7, TITLE: Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification
AUTHORS: Zhongxing Ma ; Yifan Zhao ; Jia Li
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, we propose a Pose-guided inter-and intra-part relational transformer (Pirt) for occluded person Re-Id, which builds part-aware long-term correlations by introducing transformers.
8, TITLE: Level Set Binocular Stereo with Occlusions
AUTHORS: Jialiang Wang ; Todd Zickler
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This paper introduces an energy and level-set optimizer that improves boundaries by encoding occlusion geometry.
9, TITLE: Which and Where to Focus: A Simple Yet Accurate Framework for Arbitrary-Shaped Nearby Text Detection in Scene Images
AUTHORS: Youhui Guo ; Yu Zhou ; Xugong Qin ; Weiping Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
10, TITLE: Learning Local-Global Contextual Adaptation for Fully End-to-End Bottom-Up Human Pose Estimation
AUTHORS: Nan Xue ; Tianfu Wu ; Zhen Zhang ; Gui-Song Xia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a method of learning Local-GlObal Contextual Adaptation for fully end-to-end and fast bottom-up human Pose estimation, dubbed as LOGO-CAP.
11, TITLE: Panoptic SegFormer
AUTHORS: ZHIQI LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present Panoptic SegFormer, a general framework for end-to-end panoptic segmentation with Transformers.
12, TITLE: Melatect: A Machine Learning Model Approach For Identifying Malignant Melanoma in Skin Growths
AUTHORS: Vidushi Meel ; Asritha Bodepudi
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: This paper presents Melatect, a machine learning model that identifies potential malignant melanoma.
13, TITLE: Unfolding Taylor's Approximations for Image Restoration
AUTHORS: MAN ZHOU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To solve the above problems, inspired by Taylor's Approximations, we unfold Taylor's Formula to construct a novel framework for image restoration.
14, TITLE: Scaled ReLU Matters for Training Vision Transformers
AUTHORS: PICHAO WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We verify, both theoretically and empirically, that scaled ReLU in \textit{conv-stem} not only improves training stabilization, but also increases the diversity of patch tokens, thus boosting peak performance with a large margin via adding few parameters and flops.
15, TITLE: Panoptic NuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking
AUTHORS: WHYE KIT FONG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: In this paper, we introduce the large-scale Panoptic nuScenes benchmark dataset that extends our popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks.
16, TITLE: Certifiable Outlier-Robust Geometric Perception: Exact Semidefinite Relaxations and Scalable Global Optimization
AUTHORS: Heng Yang ; Luca Carlone
CATEGORY: cs.CV [cs.CV, cs.RO, math.OC]
HIGHLIGHT: We propose the first general and scalable framework to design certifiable algorithms for robust geometric perception in the presence of outliers.
17, TITLE: On Recognizing Occluded Faces in The Wild
AUTHORS: Mustafa Ekrem Erak?n ; U?ur Demir ; Haz?m Kemal Ekenel
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present the Real World Occluded Faces (ROF) dataset, that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
18, TITLE: FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation Decoding
AUTHORS: Yiming Zhao ; Lin Bai ; Xinming Huang
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this paper, we propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step.
19, TITLE: Simple Video Generation Using Neural ODEs
AUTHORS: David Kanaa ; Vikram Voleti ; Samira Ebrahimi Kahou ; Christopher Pal
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time.
20, TITLE: Unsupervised Clothing Change Adaptive Person ReID
AUTHORS: Ziyue Zhang ; Shuai Jiang ; Congzhentao Huang ; Richard YiDa Xu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we aim to solve both problems at the same time.
21, TITLE: Egocentric View Hand Action Recognition By Leveraging Hand Surface and Hand Grasp Type
AUTHORS: SANGPIL KIM et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce a multi-stage framework that uses mean curvature on a hand surface and focuses on learning interaction between hand and object by analyzing hand grasp type for hand action recognition in egocentric videos.
22, TITLE: Matching in The Dark: A Dataset for Matching Image Pairs of Low-light Scenes
AUTHORS: W. SONG et. al.
CATEGORY: cs.CV [cs.CV, 68T40, 68T07]
HIGHLIGHT: This paper considers matching images of low-light scenes, aiming to widen the frontier of SfM and visual SLAM applications. To consider if and how well we can utilize such information stored in RAW-format images for image matching, we have created a new dataset named MID (matching in the dark).
23, TITLE: Mask Is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection
AUTHORS: XUGONG QIN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we argue that the performance degradation results from the learning confusion issue in the mask head.
24, TITLE: Master Face Attacks on Face Recognition Systems
AUTHORS: Huy H. Nguyen ; S�bastien Marcel ; Junichi Yamagishi ; Isao Echizen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we perform an extensive study on latent variable evolution (LVE), a method commonly used to generate master faces.
25, TITLE: RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management
AUTHORS: ZHUOXIAO CHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.MM]
HIGHLIGHT: Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.
26, TITLE: RGB-D Salient Object Detection with Ubiquitous Target Awareness
AUTHORS: Yifan Zhao ; Jiawei Zhao ; Jia Li ; Xiaowu Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Conventional RGB-D salient object detection methods aim to leverage depth as complementary information to find the salient regions in both modalities.
27, TITLE: Learning to Discriminate Information for Online Action Detection: Analysis and Application
AUTHORS: SUMIN LEE et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To overcome this problem, we propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others to decide whether to accumulate the input information.
28, TITLE: Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition
AUTHORS: Fernanda Hern�ndez-Luquin ; Hugo Jair Escalante
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Specifically, in this paper we propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units that aims at exploiting local information at the final stage of the learning process.
29, TITLE: Deriving Explanation of Deep Visual Saliency Models
AUTHORS: Sai Phani Kumar Malladi ; Jayanta Mukhopadhyay ; Chaker Larabi ; Santanu Chaudhury
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency.
30, TITLE: LiDARTouch: Monocular Metric Depth Estimation with A Few-beam LiDAR
AUTHORS: Florent Bartoccioni ; �loi Zablocki ; Patrick P�rez ; Matthieu Cord ; Karteek Alahari
CATEGORY: cs.CV [cs.CV, cs.AI, cs.RO, 68T45]
HIGHLIGHT: In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today's automotive-grade mass-produced laser scanners.
31, TITLE: Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
AUTHORS: CHENG GONG et. al.
CATEGORY: cs.CV [cs.CV, eess.SP, B.2.4.a; I.2.6.g; I.5.1.d; I.5.4.b]
HIGHLIGHT: We propose a new method called elastic significant bit quantization (ESB) that controls the number of significant bits of quantized values to obtain better inference accuracy with fewer resources.
32, TITLE: Digitize-PID: Automatic Digitization of Piping and Instrumentation Diagrams
AUTHORS: Shubham Paliwal ; Arushi Jain ; Monika Sharma ; Lovekesh Vig
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: A novel and efficient kernel-based line detection and a two-step method for detection of complex symbols based on a fine-grained deep recognition technique is presented in the paper. In addition, we have created an annotated synthetic dataset, Dataset-P&ID, of 500 P&IDs by incorporating different types of noise and complex symbols which is made available for public use (currently there exists no public P&ID dataset).
33, TITLE: Disentangling Alzheimer's Disease Neurodegeneration from Typical Brain Aging Using Machine Learning
AUTHORS: GYUJOON HWANG et. al.
CATEGORY: cs.LG [cs.LG, cs.CV, physics.med-ph, q-bio.NC]
HIGHLIGHT: We present a methodology toward disentangling the two.
34, TITLE: Recalibrating The KITTI Dataset Camera Setup for Improved Odometry Accuracy
AUTHORS: Igor Cvi?i? ; Ivan Markovi? ; Ivan Petrovi?
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: In this paper, we propose a new approach for one shot calibration of the KITTI dataset multiple camera setup.
35, TITLE: Tactile Image-to-Image Disentanglement of Contact Geometry from Motion-Induced Shear
AUTHORS: Anupam K. Gupta ; Laurence Aitchison ; Nathan F. Lepora
CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose a supervised convolutional deep neural network model that learns to disentangle, in the latent space, the components of sensor deformations caused by contact geometry from those due to sliding-induced shear.
36, TITLE: Time Alignment Using Lip Images for Frame-based Electrolaryngeal Voice Conversion
AUTHORS: YI-SYUAN LIOU et. al.
CATEGORY: cs.SD [cs.SD, cs.CL, cs.CV, eess.AS]
HIGHLIGHT: In this work, we propose to use lip images for time alignment, as we assume that the lip movements of laryngectomee remain normal compared to healthy people.
37, TITLE: Capturing The Objects of Vision with Neural Networks
AUTHORS: Benjamin Peters ; Nikolaus Kriegeskorte
CATEGORY: q-bio.NC [q-bio.NC, cs.CV]
HIGHLIGHT: Here, we review related work in both fields and examine how these fields can help each other.
38, TITLE: Toward Real-World Super-Resolution Via Adaptive Downsampling Models
AUTHORS: Sanghyun Son ; Jaeha Kim ; Wei-Sheng Lai ; Ming-Husan Yang ; Kyoung Mu Lee
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This study proposes a novel method to simulate an unknown downsampling process without imposing restrictive prior knowledge.
39, TITLE: FastMRI+: Clinical Pathology Annotations for Knee and Brain Fully Sampled Multi-Coil MRI Data
AUTHORS: RUIYANG ZHAO et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, physics.med-ph]
HIGHLIGHT: This work introduces fastMRI+, which consists of 16154 subspecialist expert bounding box annotations and 13 study-level labels for 22 different pathology categories on the fastMRI knee dataset, and 7570 subspecialist expert bounding box annotations and 643 study-level labels for 30 different pathology categories for the fastMRI brain dataset.
40, TITLE: Self-Supervised Representation Learning Using Visual Field Expansion on Digital Pathology
AUTHORS: JOSEPH BOYD et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this study, we propose a novel generative framework that can learn powerful representations for such tiles by learning to plausibly expand their visual field.
41, TITLE: SSEGEP: Small SEGment Emphasized Performance Evaluation Metric for Medical Image Segmentation
AUTHORS: Ammu R ; Neelam Sinha
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To address this, we propose a novel evaluation metric for segmentation performance, emphasizing smaller segments, by assigning higher weightage to smaller segment pixels.
42, TITLE: Axial Multi-layer Perceptron Architecture for Automatic Segmentation of Choroid Plexus in Multiple Sclerosis
AUTHORS: MARIUS SCHMIDT-MENGIN et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, q-bio.NC, q-bio.QM, I.2]
HIGHLIGHT: In this paper, we propose to automatically segment CP from non-contrast enhanced T1-weighted MRI.
43, TITLE: Cross-Site Severity Assessment of COVID-19 from CT Images Via Domain Adaptation
AUTHORS: GENG-XIN XU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a novel domain adaptation (DA) method with two components to address these problems.
44, TITLE: Adaptive Few-Shot Learning PoC Ultrasound COVID-19 Diagnostic System
AUTHORS: Michael Karnes ; Shehan Perera ; Srikar Adhikari ; Alper Yilmaz
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper presents a novel ultrasound imaging point-of-care (PoC) COVID-19 diagnostic system.