CVPR2025 Accepted Papers

Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Jianyang Xie · Yitian Zhao · Yanda Meng · He Zhao · Anh Nguyen · Yalin Zheng
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang · Roni Paiss · Shiran Zada · Nikhil Karnad · David E. Jacobs · Yael Pritch · Inbar Mosseri · Mike Zheng Shou · Neal Wadhwa · Nataniel Ruiz
3D Prior Is All You Need: Cross-Task Few-shot 2D Gaze Estimation
Yihua Cheng · Hengfei Wang · Zhongqun Zhang · Yang Yue · Boeun Kim · Feng Lu · Hyung Jin Chang
DICE: Discrete Inversion Enabling Controllable Editing for Masked Generative Models
Xiaoxiao He · Ligong Han · Quan Dao · Song Wen · Minhao Bai · Di Liu · Han Zhang · Felix Juefei-Xu · Chaowei Tan · Bo Liu · Martin Renqiang Min · Kang Li · Faez Ahmed · Akash Srivastava · Hongdong Li · Junzhou Huang · Dimitris N. Metaxas
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi · Matteo Farina · Massimiliano Mancini · Elisa Ricci · Nicola Strisciuglio
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An · Feng Tian · Sicong Leng · Jiahao Nie · Haonan Lin · QianYing Wang · Ping Chen · Xiaoqin Zhang · Shijian Lu
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang · Xufang Luo · Dongqi Han · Yunjian Xu · Dongsheng Li
Learning Visual Composition through Improved Semantic Guidance
Austin Stone · Hagen Soltau · Robert Geirhos · Xi Yi · Ye Xia · Bingyi Cao · Kaifeng Chen · Abhijit Ogale · Jonathon Shlens
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing
Xuanbai Chen · Xiang Xu · Zhihua Li · Tianchen Zhao · Pietro Perona · Qin ZHANG · Yifan Xing
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand · Peiran Yu · Shangqian Gao · Gowthami Somepalli · Tom Goldstein · Heng Huang
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Shuai Liu · Mingyue Cui · Boyang Li · Quanmin Liang · Tinghe Hong · Kai Huang · yunxiao shan · Kai Huang
DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis
Ziyin Zeng · Ziyin Zeng · Mingyue Dong · Jian Zhou · Huan Qiu · Zhen Dong · Man Luo · Bijun Li
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen · Yucel Yemez
Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang · Jiahao Wang · Sucheng Ren · Guoyizhe Wei · Jieru Mei · Wei Shao · Yuyin Zhou · Alan L. Yuille · Cihang Xie
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham · Hee-Seon Kim · Sangmin Woo · Changick Kim
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection
Houzhang Fang · Xiaolin Wang · Zengyang Li · Lu Wang · Qingshan Li · Yi Chang · Luxin Yan
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani · Gianluca Villani · Theodoros Panagiotakopoulos · Marc Botet Colomer · Linus Härenstam-Nielsen · Mattia Segu · Pier Luigi Dovesi · Jussi Karlgren · Daniel Cremers · Federico Tombari · Matteo Poggi
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration
Johan Edstedt · André Mateus · Alberto Jaenal
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
Mohd Hozaifa Khan · Ravi Kiran Sarvadevabhatla
Optimus-2: Mulitimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li · Yuquan Xie · Rui Shao · Gongwei Chen · Dongmei Jiang · Liqiang Nie
RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh · Deepti Rawat · Rakshitha R. T. · Tathagata Ghosh · Ravi Kiran Sarvadevabhatla
RDD: Robust Feature Detector and Descriptor using Deformable Transformer
Gonglin Chen · Tianwen Fu · Haiwei Chen · Wenbin Teng · Hanyuan Xiao · Yajie Zhao
Sample- and Parameter-Efficient Auto-Regressive Image Models
Elad Amrani · Leonid Karlinsky · Alex M. Bronstein
Split Adaptation for Pre-trained Vision Transformers
Lixu Wang · Bingqi Shang · Yi Li · Payal Mohapatra · Wei Dong · Xiao Wang · Qi Zhu
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei · Shaojie Lyu · Gong Chen · Ke Ma · Qianqian Xu · Yingfei Sun · Qingming Huang
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Jiahao Lu · Tianyu Huang · Peng Li · Zhiyang Dou · Cheng Lin · Zhiming Cui · Zhen Dong · Sai-Kit Yeung · Wenping Wang · Yuan Liu
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu · Jiajun Deng · Guoliang You · Yifan Duan · Houqiang Li · Yanyong Zhang
What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Omri Kaduri · Shai Bagon · Tali Dekel
DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework
Yalong Xu · Lin Zhao · Chen Gong · Guangyu Li · Di Wang · Nannan Wang
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann · Yannick Wattenberg · Tamaz Amiranashvili · Suprosanna Shit · Bjoern Menze
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson · Marcelino M. de Almeida · Sachit Mahajan · Martin Labrie · Maani Ghaffari · Omid Ghasemalizadeh · Min Sun · Cheng-Hao Kuo · Arnab Sen
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
Mojtaba Nafez · Amirhossein Koochakian · Arad Maleki · Jafar Habibi · Mohammad Rohban
Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution
Naveen Kumar Kummari · Ranjeet Ranjan Jha · Krishna Mohan Chalavadi · Ravindra Babu Tallamraju
HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving
Farchan Hakim Raswa · Chun-Shien Lu · Jia-Ching Wang
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang · Yang Sui · Jinqi Xiao · Lingyi Huang · Yu Gong · Chendi Li · Jinghua Yan · Yu Bai · Ponnuswamy Sadayappan · Xia Hu · Bo Yuan
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong · Ran Ding · Wei Li · Zhang Peng · Guobin Tang · Jia Guo
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou · Bangjie Yin · Hefei Ling · Qianyu Zhou · Wenxuan Wang
Video Depth without Video Models
Bingxin Ke · Dominik Narnhofer · Shengyu Huang · Lei Ke · Torben Peters · Katerina Fragkiadaki · Anton Obukhov · Konrad Schindler
Stop learning it all to mitigate visual hallucination, Focus on the hallucination target.
Dokyoon Yoon · Youngsook Song · Woomyoung Park
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang · Xinpeng Ding · Chunwei Wang · Jianhua Han · Yulong Liu · Hengshuang Zhao · Hang Xu · Lu Hou · Wei Zhang · Xiaodan Liang
GENIUS: A Generative Framework for Universal Multimodal Search
Sungyeon Kim · Xinliang Zhu · Xiaofan Lin · Muhammet Bastan · Douglas Gray · Suha Kwak
Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering
Biplab Das · Viswanath Gopalakrishnan
Structure-from-Motion with a Non-Parametric Camera Model
Yihan Wang · Linfei Pan · Marc Pollefeys · Viktor Larsson
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
Yiyang Chen · Tianyu Ding · Lei Wang · Jing Huo · Yang Gao · Wenbin Li
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go · byeongjun park · Jiho Jang · Jin-Young Kim · Soonwoo Kwon · Changick Kim
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration
Jiong Lin · Lechen Zhang · Kwansoo Lee · Jialong Ning · Judah A Goldfeder · Hod Lipson
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
Taha Koleilat · Hojat Asgariandehkordi · Hassan Rivaz · Yiming Xiao
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
Shape Abstraction via Marching Differentiable Support Functions
Sunkyung Park · Jeongmin Lee · Dongjun Lee
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang · Jue Zhang · Xiaoting Qin · Jing Yu · Gaopeng Gou · Gang Xiong · Qingwei Lin · Saravan Rajmohan · Dongmei Zhang · Qi Wu
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng · Hao Tang · Yihua Shao · Siyu Chen · Ling Shao · Yan Wang
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
Tianyu Yu · Haoye Zhang · Qiming Li · Qixin Xu · Yuan Yao · Da Chen · Xiaoman Lu · Ganqu Cui · Yunkai Dang · Taiwen He · Xiaocheng Feng · Jun Song · Bo Zheng · Zhiyuan Liu · Tat-seng Chua · Maosong Sun
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Kha Do · Kang Han · Phu Lai · Khoa T. Phan · Wei Xiang
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye · Kareem elgohary · Anvita A. Srinivas · Fauzan Zaid · Joël L. Lavanchy · Nicolas Padoy
Diff2Flow: Bridging the Gap between Diffusion and Flow Matching with Minimal Cost
Johannes Schusterbauer · Ming Gui · Frank Fundel · Björn Ommer
A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
Shilhora Akshay · Niveditha Lakshmi Narasimhan · Jacob George · Vineeth Balasubramanian
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
Sean Wu · Shamk Basu · Tim Broedermann · Luc Van Gool · Christos Sakaridis
Enhancing Adversarial Transferability with Checkpoints of a Single Model’s Training
Shixin Li · Chaoxiang He · Xiaojing Ma · Bin Benjamin Zhu · Shuo Wang · Hongsheng Hu · Dongmei Zhang · Linchen Yu
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Zongsheng Yue · Kang Liao · Chen Change Loy
Context-Aware Multimodal Pretraining
Karsten Roth · Zeynep Akata · Dima Damen · Ivana Balazevic · Olivier J Henaff
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou · Songze Li · Duanyi Yao
Multi-modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery
Jiahua Rao · hanjing Lin · Leyu Chen · Jiancong Xie · Shuangjia Zheng · Yuedong Yang
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui · Xun Guo · Wengang Zhou · Yan Lu
Automated Proof of Polynomial Inequalities via Reinforcement Learning
Banglong Liu · Niuniu Qi · Xia Zeng · Lydia Dehbi · Zhengfeng Yang
Improving Gaussian Splatting with Localized Points Management
Haosen Yang · Chenhao Zhang · Wenqing Wang · Marco Volino · Adrian Hilton · Li Zhang · Xiatian Zhu
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang · Tianhong Dai · Bingfeng Zhang · Siyue Yu · ENG GEE LIM · Jimin Xiao
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang · Shuzhao Xie · Chengwei Ren · Siyi Xie · Chen Tang · Shijia Ge · Mingzi Wang · Zhi Wang
SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models
Xiaofan Bai · Shixin Li · Xiaojing Ma · Bin Benjamin Zhu · Dongmei Zhang · Linchen Yu
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye · Yukang Gan · Xiaoke Huang · Yixiao Ge · Yansong Tang
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li · Renping Zhou · Jiawei Zhou · Yingwei Song · Johannes Herter · Minghan Qin · Gao Huang · Hanspeter Pfister
Locally Orderless Images for Optimization in Differentiable Rendering
Ishit Mehta · Manmohan Chandraker · Ravi Ramamoorthi
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song · Xiaoye Qu · Jiawei Zhou · Yu Cheng
UniScene: Unified Occupancy-centric Driving Scene Generation
Bohan Li · Jiazhe Guo · Hongsi Liu · Yingshuang Zou · Yikang Ding · Xiwu Chen · Hu ZHU · Feiyang Tan · Chi Zhang · Tiancai Wang · Shuchang Zhou · Li Zhang · Xiaojuan Qi · Hao Zhao · Mu Yang · Wenjun Zeng · Xin Jin
Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion
Xiangfeng Xu · Pinyi Zhang · Wenxuan Huang · Yunhang Shen · Haosheng Chen · Jingzhong Lin · Wei Li · Gaoqi He · Jiao Xie · Shaohui Lin
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou · Hengjian Zhou · Haibo Hu · Zihao WEN · Jianping Wang · Yung-Hui Li · Yu-Kai Huang
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng · Jinrui Zhang · Qingni Wang · Teng Wang · Jinming Duan · Feng Zheng
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei · Yuan Feng · Long Ma · Chen Wang · Yuanfeng Zhou · Changjian Li
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas · Christos Diou
GenAssets: Generating in-the-wild 3D Assets in Latent Space
Ze Yang · Jingkang Wang · Haowei Zhang · Sivabalan Manivasagam · Yun Chen · Raquel Urtasun
Perceptual Inductive Bias Is What You Need Before Contrastive Learning
Junru Zhao · Tianqin Li · Dunhan Jiang · Shenghao Wu · Alan Ramirez · Tai Sing Lee
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee · Hyeongcheol Park · Jiyoung Seo · Eunbyung Park · Hyunje Park · Ha Dam Baek · Shin sangheon · sangmin kim · Sangpil Kim
GEM: A Generalizable Ego-vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan · Sebastian Stapf · Ahmad Rahimi · Pedro M B Rezende · Yasaman Haghighi · David Brüggemann · Isinsu Katircioglu · Lin Zhang · Xiaoran Chen · Suman Saha · Marco Cannici · Elie Aljalbout · Botao Ye · Xi Wang · Aram Davtyan · Mathieu Salzmann · Davide Scaramuzza · Marc Pollefeys · Paolo Favaro · Alex Alahi
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li · Jike Zhong · Tianle Chen · Yuxiang Lai · Konstantinos Psounis
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
Ning Ni · Libao Zhang
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed · Davide Buscaldi · Vanier Sonia · Aymen Shabou
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
Buzhen Huang · Chen Li · Chongyang Xu · Dongyue Lu · Jinnan Chen · Yangang Wang · Gim Hee Lee
Synthetic Visual Genome
Jae Sung Park · Zixian Ma · Linjie Li · Chenhao Zheng · Cheng-Yu Hsieh · Ximing Lu · Khyathi Chandu · Quan Kong · Norimasa Kobori · Ali Farhadi · Yejin Choi · Ranjay Krishna
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski · Weitong Zhang · Hadrien Reynaud · Sarah Cechnicka · Bernhard Kainz
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Hossein Kashiani · Niloufar Alipour Talemi · Fatemeh Afghah
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu · Ji Zhang · Pengpeng Zeng · Lianli Gao · Jingkuan Song · Heng Tao Shen
Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye · Zihan Wang · Haosen Sun · Keshigeyan Chandrasegaran · Zane Durante · Cristobal Eyzaguirre · Yonatan Bisk · Juan Carlos Niebles · Ehsan Adeli · Li Fei-Fei · Jiajun Wu · Manling Li
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani · Dinura Dissanayake · Hasindri Watawana · Noor Ahsan · Nevasini Sasikumar · Omkar Thawakar · Henok Biadglign Ademtew · Yahya Hmaiti · Amandeep Kumar · Kartik Kuckreja · Mykola Maslych · Wafa Al Ghallabi · Mihail Minkov Mihaylov · Chao Qin · Abdelrahman Shaker · Mike Zhang · Mahardika Krisna Ihsani · Amiel Gian Esplana · Monil Gokani · Shachar Mirkin · Harsh Singh · Ashay Srivastava · Endre Hamerlik · Fathinah Asma Izzati · Fadillah Adamsyah Maani · Sebastian Cavada · Jenny Chim · Rohit Gupta · Sanjay Manjunath · Kamila Zhumakhanova · Feno Heriniaina Rabevohitra · Azril Hafizi Amirudin · Muhammad Ridzuan · Daniya Najiha Abdul Kareem · Ketan Pravin More · Kunyang Li · Pramesh Shakya · Muhammad Saad · Amirpouya Ghasemaghaei · Amirbek Djanibekov · Dilshod Azizov · Branislava Jankovic · Naman Bhatia · Alvaro Cabrera Berobide · Johan Obando-Ceron · Olympiah Otieno · Fabian Farestam · Muztoba Rabbani · Sanoojan Baliah · Santosh Sanjeev · Abduragim Shtanchaev · Maheen Fatima · Thao Nguyen · Amrin Kareem · Toluwani Aremu · Nathan Augusto Zacarias Xavier · Amit Bhatkal · Hawau Olamide Toyin · Aman Chadha · Hisham Cholakkal · Rao Anwer · Michael Felsberg · Jorma Laaksonen · Thamar Solorio · Monojit Choudhury · Ivan Laptev · Mubarak Shah · Salman Khan · Fahad Shahbaz Khan
MOS: Modeling Object-Scene Associations in Generalized Category Discovery
Zhengyuan Peng · Jinpeng Ma · Zhimin Sun · Ran Yi · Haichuan Song · Xin Tan · Lizhuang Ma
Sea-ing in Low-light
Nisha Varghese · A. N. Rajagopalan
Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels
Quanjiang Li · Tingjin Luo · Jiahui Liao
EBS-EKF: Accurate and High Frequency Event-based Star Tracking
Albert Reed · Connor Hashemi · Dennis Melamed · Nitesh Menon · Keigo Hirakawa · Scott McCloskey
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng · Tianyu He · Li Jiang · Tianyu Wang · Feras Dayoub · Ian Reid
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue · Xiangyu Yin · Boyuan Yang · Wei Gao
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu · Wuxuan Shi · Jinqiao Wang · Mang Ye
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao · Yao Lu · Moo Jin Kim · Zipeng Fu · Zhuoyang Zhang · Yecheng Wu · Max Li · Qianli Ma · Song Han · Chelsea Finn · Ankur Handa · Tsung-Yi Lin · Gordon Wetzstein · Ming-Yu Liu · Donglai Xiang
Learning Partonomic 3D Reconstruction from Image Collections
Xiaoqian Ruan · Pei Yu · Dian Jia · Hyeonjeong Park · Peixi Xiong · Wei Tang
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko · Tinghuai Wang · Wassim Swaileh · Shiyan Sun · Ioannis Patras
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao · Jinlong Li · Shuang Wang · Mengyao Wu · Qi Zang · Nicu Sebe · Zhun Zhong
Unconstrained 3D gaze estimation with Gaze-Aware 3D Context Encoding
Yuki Kawana · Shintaro Shiba · Quan Kong · Norimasa Kobori
All-directional Disparity Estimation for Real-world QPD Images
Hongtao Yu · Shaohui Song · Lihu Sun · Wenkai Su · Xiaodong Yang · Chengming Liu
Bridging the Gap between Diffusion Models and Universal Quantization for Image Compression
Lucas Relic · Roberto Azevedo · Yang Zhang · Markus Gross · Christopher Schroers
Targeted Forgetting of Image Subgroups in CLIP Models
Zeliang Zhang · Gaowen Liu · Charles Fleming · Ramana Kompella · Chenliang Xu
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model
Changchang Sun · Gaowen Liu · Charles Fleming · Yan Yan
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang · Yang Yu · Yucheng Chen · Xulei Yang · Si Yong Yeo
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong · Vladimir Guzov · Hyo Jin Kim · Yuting Ye · Richard Newcombe · Ziwei Liu · Lingni Ma
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri · Carola-Bibiane Schönlieb · Bruno Ribeiro · Chaim Baskin · Moshe Eliasof
Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model
Hang Chen · Yin Xie · Xiaoxiu Peng · Lihu Sun · Wenkai Su · Xiaodong Yang · Chengming Liu
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Otto Brookes · Maksim Kukushkin · Majid Mirmehdi · Colleen Stephens · Paula Dieguez · Thurston Cleveland Hicks · Sorrel CZ Jones · Kevin C. Lee · Maureen S. McCarthy · Amelia C. Meier · NORMAND E. · Erin G. Wessling · Roman M. Wittig · Kevin Langergraber · Klaus Zuberbühler · Lukas Boesch · Thomas Schmid · Mimi Arandjelovic · Hjalmar S. Kühl · Tilo Burghardt
Boosting Point-Supervised Temporal Action Localization through Integrating Query Reformation and Optimal Transport
Mengnan Liu · Le Wang · Sanping Zhou · Kun Xia · Xiaolong Sun · Gang Hua
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang · Yichen Wang · Shihui Yan · Chenyu Zhu · Ziqi Zhou · Linshan Hou · Shengshan Hu · Minghui Li · Yanjun Zhang · Leo Yu Zhang
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
Mayug Maniparambil · Raiymbek Akshulakov · YASSER ABDELAZIZ DAHOU DJILALI · Sanath Narayan · Ankit Singh · Noel O’Connor
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie · Yequan Bie · Jianda Mao · Yangqiu Song · Yang Wang · Hao Chen · Kani Chen
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen · Muhammad Akhtar Munir · Sanoojan Baliah · Salman Khan · Muhammad Haris Khan
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
TANUJ SUR · Samrat Mukherjee · Kaizer Rahaman · Subhasis Chaudhuri · Muhammad Haris Khan · Biplab Banerjee
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang · Yeda Song · Sungryull Sohn · Lajanugen Logeswaran · Tiange Luo · Dong-Ki Kim · GyungHoon Bae · Honglak Lee
Seeing 3D World in A Grain of Sand
Yufan Zhang · Yu Ji · Yu Guo · Jinwei Ye
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
Yiyang Shen · Kun Zhou · He Wang · Yin Yang · Tianjia Shao
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu · Keyu Fan · Weihao Yu · Chenxin Li · Hao Lu · Yixuan Yuan
Generative Video Propagation
Shaoteng Liu · Tianyu Wang · Jui-Hsien Wang · Qing Liu · Zhifei Zhang · Joon-Young Lee · Yijun Li · Bei Yu · Zhe Lin · Soo Ye Kim · Jiaya Jia
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening
Jie Huang · Haorui Chen · Jiaxuan Ren · Siran Peng · Liang-Jian Deng
UniPose: A Unified MultiModal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li · RuiBing Hou · Hong Chang · Shiguang Shan · Xilin Chen
Fractal Calibration for long-tailed object detection
Konstantinos Alexandridis Alexandridis · Ismail Elezi · Jiankang Deng · Anh Nguyen · Shan Luo
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi · Alessandro Conti · Gaowen Liu · Elisa Ricci · Massimiliano Mancini
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
Hua Yu · Weiming Liu · Gui Xu · Yaqing Hou · Yew-Soon Ong · Qiang Zhang
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li · Xingxuan Zhang · Hao Zou · Yige Guo · Renzhe Xu · Yilong Liu · Chuzhao Zhu · Yue He · Peng Cui
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Jiho Choi · Seonho Lee · Minhyun Lee · Seungho Lee · Hyunjung Shim
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
Qiang Zhang · Mengsheng Zhao · Jiawei Liu · Fanrui Zhang · Yongchao Xu · Zheng-Jun Zha
Opportunistic Single-Photon Time of Flight
Sotiris Nousias · Mian Wei · Howard Xiao · Maxx Wu · Shahmeer Athar · Kevin J Wang · Anagh Malik · David A. Barmherzig · David B. Lindell · Kiriakos Kutulakos
GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations
Yiwei Bao · Zhiming Wang · Feng Lu
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang · Yaxiong Wang · Lechao Cheng · Zhun Zhong · Dan Guo · Meng Wang
SimAvatar: Simulation-Ready Clothed Gaussian Avatars from Text
Xueting Li · Ye Yuan · Shalini De Mello · Miles Macklin · Jonathan Leaf · Gilles Daviet · Jan Kautz · Umar Iqbal
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li · Haiyang Jiang · Daisuke Iso
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Ahmad Rahimi · Po-Chien Luan · Yuejiang Liu · Frano Rajič · Alex Alahi
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Yuhao Dong · Zuyan Liu · Hai-Long Sun · Jingkang Yang · Winston Hu · Yongming Rao · Ziwei Liu
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
Xiyue Guo · Jiarui Hu · Junjie Hu · Hujun Bao · Guofeng Zhang
M3amba: Memory Mamba is All You Need for Whole Slide Image Classification
Tingting Zheng · Kui Jiang · Yi Xiao · Sicheng Zhao · Hongxun Yao
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu · Jinghan Xu · Yi Chang · Hanyu Zhou · Haozhi Zhao · Lin Wang · Luxin Yan
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li · Mi Zhang · Yiming Sun · Min Yang
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis
You Wang · Li Fang · Hao Zhu · Fei Hu · Long Ye · Zhan Ma
pFedMixF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation
Yifei Zhang · Hao Zhu · Alysa Ziying Tan · Dianzhi Yu · Longtao Huang · Han Yu
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang · Jing Yu · Keke Gai · Jiamin Zhuang · Gang Xiong · Gaopeng Gou · Qi Wu
Improved monocular depth prediction using distance transform over pre-semantic contours with self-supervised neural networks
Marwane Hariat · Antoine Manzanera · David Filliat
Using diffusion priors for video amodal segmentation
Kaihua Chen · Deva Ramanan · Tarasha Khurana
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche · Simon Leglaive · Xavier Alameda-Pineda · Francesc Moreno-Noguer
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
Weiguang Zhao · Rui Zhang · Qiufeng Wang · Guangliang Cheng · Kaizhu Huang
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
Jianan Ye · Weiguang Zhao · Xi Yang · Guangliang Cheng · Kaizhu Huang
Diffusion-based Event Generation for High-Quality Image Deblurring
Xinan Xie · Qing Zhang · Wei-Shi Zheng
VIRES: Video Instance Repainting with Sketch and Text Guidance
Shuchen Weng · Haojie Zheng · Peixuan Zhang · Yuchen Hong · Han Jiang · Si Li · Boxin Shi
BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation
Shengze Wang · Jiefeng Li · Tianye Li · Ye Yuan · Henry Fuchs · Koki Nagano · Shalini De Mello · Michael Stengel
Instance-wise Supervision-level Optimization in Active Learning
Shinnosuke Matsuo · Riku Togashi · Ryoma Bise · Seiichi Uchida · Masahiro Nomura
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo · Zhen Zhao · Zicheng Wang · Tong Chen · YUNYI LIU · Luping Zhou
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park · Keun-Soo Heo · Dong-Hee Shin · Young-Han Son · Ji-Hye Oh · Tae-Eui Kam
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu · Yan Bai · Yujia Zhang · Zhuoling Li · Fei Xia · Kwan-Yee K. Wong · Jianqiang Wang · Hengshuang Zhao
SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction
Xinran Yang · Donghao Ji · Yuanqi Li · Jie Guo · Yanwen Guo · Junyuan Xie
EdgeTAM: On-Device Track Anything Model
Chong Zhou · Chenchen Zhu · Yunyang Xiong · Saksham Suri · Fanyi Xiao · Lemeng Wu · Raghuraman Krishnamoorthi · Bo Dai · Chen Change Loy · Vikas Chandra · Bilge Soran
Revisiting Generative Replay for Class Incremental Object Detection
Shizhou Zhang · Xueqiang Lv · Yinghui Xing · Qirui Wu · Di Xu · Yanning Zhang
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild
Wei Liu · Yufei Chen · Xiaodong Yue
Meta-Learning Hyperparameters for Foundation Model Adaptation in Remote-Sensing Imagery
Zichen Tian · Yaoyao Liu · Yaoyao Liu · Qianru Sun
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan · Varun Gupta · Darshan Singh · Zeeshan Khan · Vineet Gandhi · Makarand Tapaswi
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu · Ruining Deng · Tianyuan Yao · Juming Xiong · Chongyu Qu · Junlin Guo · Siqi Lu · Mengmeng Yin · Yu Wang · Shilin Zhao · Haichun Yang · Yuankai Huo
Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
Joohyun Kwon · Hanbyel Cho · Junmo Kim
Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
Kaixin Fan · Pengfei Ren · Jingyu Wang · Haifeng Sun · Qi Qi · Zirui Zhuang · Jianxin Liao
PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking
Zekai Shao · Yufan Hu · Bin Fan · Hongmin Liu
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs or Effective Long Video Analysis with LLMs
Zeyi Huang · Yuyang Ji · Xiaofang Wang · Nikhil Mehta · Tong Xiao · Donghyun Lee · Sigmund VanValkenburgh · Shengxin Zha · Bolin Lai · Licheng Yu · Ning Zhang · Yong Jae Lee · Miao Liu
SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning
Yuncong Yang · Han Yang · Jiachen Zhou · Peihao Chen · Hongxin Zhang · Yilun Du · Chuang Gan
Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning
Juntae Lee · Munawar Hayat · Sungrack Yun
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs
Zicheng Zhang · Ziheng Jia · Haoning Wu · Chunyi Li · Zijian Chen · Yingjie Zhou · Wei Sun · Xiaohong Liu · Xiongkuo Min · Weisi Lin · Guangtao Zhai
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
Mengqiu XU · Kaixin Chen · Heng Guo · Yixiang Huang · Ming Wu · Zhenwei Shi · Chuang Zhang · Jun Guo
Sufficient Invariant Learning for Distribution Shift
Taero Kim · Subeen Park · Sungjun Lim · Yonghan Jung · Krikamol Muandet · Kyungwoo Song
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim · Soohyun Ryu · Yeonsung Jung · Hyunkoo Lee · Joowon Kim · June Yong Yang · Jaeryong Hwang · Eunho Yang
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong · Seyun Bae · Yeonsung Jung · Jaeryong Hwang · Eunho Yang
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Hui En Pang · Shuai Liu · Zhongang Cai · Lei Yang · Tianwei Zhang · Ziwei Liu
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Tongyuan Bai · Wangyuanfan Bai · Dong Chen · Tieru Wu · Manyi Li · Rui Ma
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Mahtab Bigverdi · Zelun Luo · Cheng-Yu Hsieh · Ethan Shen · Dongping Chen · Linda Shapiro · Ranjay Krishna
Satellite Observations-guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
Siwei Tu · Ben Fei · Weidong Yang · Fenghua Ling · Hao Chen · Zili Liu · Kun Chen · Hang Fan · Wanli Ouyang · Lei Bai
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal · Xiang Yue · Erion Plaku · Ziyu Yao
EventFly: Event Camera Perception from Ground to the Sky
Lingdong Kong · Dongyue Lu · Xiang Xu · Lai Xing Ng · Wei Tsang Ooi · Benoit Cottereau
Learning Endogenous Attention for Incremental Object Detection
Xiang Song · Yuhang He · Jingyuan Li · Qiang Wang · Yihong Gong
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li · Weijian Ma · Xueyang Li · Yunzhong Lou · Guichun Zhou · Xiangdong Zhou
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation
Jiaxin Cai · Jingze Su · Qi Li · Wenjie Yang · Shu Wang · Tiesong Zhao · Shengfeng He · Wenxi Liu
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang · Anpei Chen · Yumin Wan · Zikai Song · Junqing Yu · Yawei Luo · Wei Yang
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li · Cristiano Saltori · Fabio Poiesi · Nicu Sebe
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu · Yuxin Zheng · Zelong Li · Chi Wang · Renshu Gu · Weiwei Xu · Gang Xu
EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds
Yujun Liu · Ruisheng Wang · Shangfeng Huang · GuoRong Cai
PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding
Hongjia Zhai · Hai Li · Zhenzhe Li · Xiaokun Pan · Yijia He · Guofeng Zhang
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni · Ge Yan · Chung-En Sun · Tuomas Oikarinen · Tsui-Wei Weng
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin · Dingkang Liang · Zhenghao Qi · Xiang Bai
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation Distillation
Yuxiang Fu · Qi Yan · Ke Li · Lele Wang · Renjie Liao
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
Kyungmin Jo · Jooyeol Yun · Jaegul Choo
Three-view Focal Length Recovery From Homographies
Yaqing Ding · Viktor Kocur · Zuzana Berger Haladova · Qianliang Wu · Shen Cai · Jian Yang · Zuzana Kukelova
Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples
WEIWEI LI · Junzhuo Liu · Yuanyuan Ren · Yuchen Zheng · Yahao Liu · Wen Li
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu · Quyu Kong · Kechun Xu · Xunlong Xia · Bing Deng · Jieping Ye · Rong Xiong · Yue Wang
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang · Nana Yu · Zihao Zhang · Yahong Han
PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dongkyu Cho · Inwoo Hwang · Sanghack Lee
A Physics-Informed Blur Learning Framework for Imaging Systems
liqun.chen · Yuxuan Li · Jun Dai · Jinwei Gu · Tianfan Xue
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen · Wei-Ting Chen · Yu-Wei Liu · Yuan-Chun Chiang · Sy-Yen Kuo · Ming-Hsuan Yang
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu · Shuaiyong Li · Yongqiang Zhu
Birth and Death of a Rose
Chen Geng · Yunzhi Zhang · Shangzhe Wu · Jiajun Wu
AeSPa : Attention-guided Self-supervised Parallel imaging for MRI Reconstruction
Jinho Joo · Hyeseong Kim · Hyeyeon Won · Deukhee Lee · Taejoon Eo · Dosik Hwang
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo · Hanshen Zhu · Ziyang Zhang · Dingkang Liang · Xudong Xie · Yuliang Liu · Xiang Bai
T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning
Seong-Hyeon Hwang · Minsu Kim · Steven Euijong Whang
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen · Yikai Wang · Wenqiang Sun · Feng Wang · Yiwen Chen · Huaping Liu
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Shu-Wei Lu · Yi-Hsuan Tsai · Yi-Ting Chen
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong · Bokai Lai · Yiyi Zhou · Gen Luo · Yunhang Shen · Ke Li · Xiaoshuai Sun · Rongrong Ji
Optimizing for the Shortest Path in Denoising Diffusion Model
Ping Chen · Xingpeng Zhang · Zhaoxiang Liu · Huan Hu · Xiang Liu · Kai Wang · Min Wang · Yanlin Qian · Shiguo Lian
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang · Jiawei Feng · Weibin Luo · Dani Lischinski · Daniel Cohen-Or · Hui Huang
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han · Xiangzuo Wu · Huan Liao · Zunnan Xu · Zhongyuan Hu · Ronghui Li · Yachao Zhang · Xiu Li
Spectral Informed Mamba for Robust Point Cloud Processing
Ali Bahri · Moslem Yazdanpanah · Mehrdad Noori · Sahar Dastani · Milad Cheraghalikhani · David OSOWIECHI · Gustavo Vargas Hakim · Farzad Beizaee · Ismail Ben Ayed · Christian Desrosiers
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo · Xiaoshui Huang · Hao jiacheng · Yunpeng Bai · Hongping Gan · Yilei Shi
BIOMEDICA: An Open Biomedical Image-Caption Archive with Vision-Language Models derived from Scientific Literature
Alejandro Lozano · Min Woo Sun · James Burgess · Liangyu Chen · Jeffrey J Nirschl · Jeffrey Gu · Ivan Lopez · Josiah Aklilu · Austin Wolfgang Katzer · Collin Chiu · Anita Rau · Xiaohan Wang · Yuhui Zhang · Alfred Seunghoon Song · Robert Tibshirani · Serena Yeung
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu · Jichao Zhang · Paolo Rota · Nicu Sebe
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin · Ye eun Lee · Hanbyol Jang · Geonhui Son · Hyeongyu Kim · Dosik Hwang
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen · Gehui Li · Rongyuan Wu · Xindong Zhang · Jie Chen · Jian Zhang · Lei Zhang
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin · Xueyang Yu · Ziqi Pang · Yu-Xiong Wang
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
Ding Ding · Yueming Pan · Ruoyu Feng · Qi Dai · Kai Qiu · Jianmin Bao · Chong Luo · Zhenzhong Chen
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du CHEN · Tianhe Wu · Kede Ma · Lei Zhang
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang · Yake Wei · Zequn Yang · Di Hu
IceDiff: High Resolution and High-Quality Arctic Sea Ice Forecasting with Generative Diffusion Prior
Jingyi Xu · Siwei Tu · Weidong Yang · Ben Fei · Shuhao Li · Keyi Liu · Yeqi Luo · Lipeng Ma · Lei Bai
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen · Pha Nguyen · Jackson Cothren · Alper Yilmaz · Khoa Luu
Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling
Xingyu Chen · Zihao Feng · Kun Qian · Xinyu Zhang
EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
Xinran Yang · Donghao Ji · Yuanqi Li · Junyuan Xie · Jie Guo · Yanwen Guo
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph
Rao Fu · Jianmin Zheng · Liang Yu
Samba: A Unified Mamba-based Framework for General Salient Object Detection
Jiahao He · Keren Fu · Xiaohong Liu · Qijun Zhao
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments
Ege Özsoy · Chantal Pellegrini · Tobias Czempiel · Felix Tristram · Kun yuan · David Bani-Harouni · Ulrich Eck · Benjamin Busam · Matthias Keicher · Nassir Navab
Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
Suho Ryu · Kihyun Kim · Eugene Baek · Dongsoo Shin · Joonseok Lee
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang · Munan Ning · Zheyuan Liu · Yanbo Wang · Jiayi Ye · Yue Huang · Shuo Yang · Xiao Chen · Yibing Song · Li Yuan
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang · Shuai Liu · Hongming Guo · Yuhao Dong · Xiamengwei Zhang · Sicheng Zhang · Pengyun Wang · Zitang Zhou · Binzhu Xie · Ziyue Wang · Bei Ouyang · Zhengyu Lin · Marco Cominelli · Zhongang Cai · Bo Li · Yuanhan Zhang · Peiyuan Zhang · Fangzhou Hong · Joerg Widmer · Francesco Gringoli · Lei Yang · Ziwei Liu
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan Hur · Jeong-hun Hong · Dong-hun Lee · Dabin Kang · Semin Myeong · Sang-hyo Park · Hyeyoung Park
Unlocking the potential of unlabeled data in semi-supervised domain generalization
Dongkwan Lee · Kyomin Hwang · Nojun Kwak
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi · Jing Xu · Lijing Lu · Zhihang Li · Kai Hu
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream
Hao Li · Sicheng Li · Xiang Gao · AbudouaihatiBatuer · Lu Yu · Yiyi Liao
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing · Lihao Liu · Yijun Yang · Hongqiu Wang · Tian Ye · Sixiang Chen · Wenxue Li · Guang Liu · Lei Zhu
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing
Yanjun Li · Zhaoyang Li · Honghui Chen · li’Zhi Xu
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals
Changhao Peng
Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation
Zhaoyang Li · Yuan Wang · Wangkai Li · Tianzhu Zhang · Xiang Liu
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Yu Cao · Zengqun Zhao · Ioannis Patras · Shaogang Gong
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang · Chaoqi Chen · Yuxuan Yuan · Zeyu Zhang · Yue Huang · Kun Zhang
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu · Dingxi Zhang · Alex Jiang · Wanjia Fu · Austin Funk · Daniel Ritchie · Srinath Sridhar
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang · Tianhan Xu · Yuta Kikuchi
Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss
Ravishankar Evani · Deepu Rajan · Shangbo Mao
Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs
Mauricio Byrd Victorica · György Dán · Henrik Sandberg
CSC-PA: Cross-image Semantic Correlation via Prototype Attentions for Single-network Semi-supervised Breast Tumor Segmentation
Zhenhui Ding · Guilian Chen · Qin Zhang · Huisi Wu · Jing Qin
PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers
WooJu Lee · Juhye Park · Dasol Hong · Changki Sung · Youngwoo Seo · DongWan Kang · Hyun Myung
Leveraging Global Stereo Consistency for Category-Level Shape and 6D Pose Estimation from Stereo Images
Junning Qiu · Minglei Lu · Fei Wang · Yu Guo · Yonggen Ling
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang · Zhihao Qian · Ye Zhu · Olga Russakovsky · Yu Wu
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization
You Shen · Zhipeng Zhang · Xinyang Li · Yansong Qu · Yu Lin · Shengchuan Zhang · Liujuan Cao
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
Silin Cheng · Yang Liu · Xinwei He · Sebastien Ourselin · Lei Tan · Gen Luo
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye · Huijuan Huang · Xintao Wang · Pengfei Wan · Di ZHANG · Wenhan Luo
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
Jeonghyeon Kim · Sangheum Hwang
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan · Zeshi Yang · Zhiyang Dou · Wenjia Wang · Buzhen Huang · Bo Dai · Taku Komura · Jingbo Wang
HOT: Hadamard-based Optimized Training
Seonggon Kim · Juncheol Shin · Seung-taek Woo · Eunhyeok Park
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen · Zhuheng Song · Xiaoke Jiang · Yaoqing Hu · Junzhi Yu · Lei Zhang
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé · Luca Versari · Emilien Dupont · Hyunjik Kim · Matthias Bauer
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Wenyuan Zhang · Emily Yue-ting Jia · Junsheng Zhou · Baorui Ma · Kanle Shi · Yu-Shen Liu · Zhizhong Han
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
Patrick Rim · Hyoungseob Park · Suchisrit Gangopadhyay · Ziyao Zeng · Younjoon Chung · Alex Wong
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu · Tianyi Zhu · Yi Gu · Li Lin · Pujin Cheng · Yebin Liu · Xiaoying Tang · Liang An
MeshArt: Generating Articulated Meshes with Structure-guided Transformers
Daoyi Gao · Mohd Yawar Nihal Siddiqui · Lei Li · Angela Dai
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao · Ziquan Liu · Yu Cao · Shaogang Gong · Ioannis Patras
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse · Jan van Gemert
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
Markus Karmann · Onay Urfalioglu
MaSS13K: A Matting-level Semantic Segmentation Benchmark
Chenxi Xie · Minghan LI · Hui Zeng · Jun Luo · Lei Zhang
Probabilistic Prompt Distribution Learning for Animal Pose Estimation
Jiyong Rao · Brian Nlong Zhao · Yu Wang
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye · Sebastin Santy · Jena D. Hwang · Amy X Zhang · Ranjay Krishna
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu · Yuheng Zhou · Xiuwei Xu · Ziwei Wang · Haibin Yan
Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration
Jiani Ni · He Zhao · Jintong Gao · Dandan Guo · Hongyuan Zha
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang · Dan Song · pengxin zhan · Tianyu Chang · Jianhao Zeng · Qing-Guo Chen · Weihua Luo · An-An Liu
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt · Leonard Schenk · Leo Schwinn · Stephan Günnemann
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney · Liangze Jiang · Florin Gogianu · Ehsan Abbasnejad
Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu · Haochuan Li · Wenjie Wang · Xiang Liu · Juncheng Li · Liqiang Nie · Tat-seng Chua
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker · Tamar Rott Shaham · Kristine Zheng · Alex Zhao · Judith Fan · Antonio Torralba
OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining
Bingzhi Chen · Sisi Fu · Xiaocheng Fang · Jieyi Cai · Boya Zhang · Minhua Lu · Yishu Liu
Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution
Zhiyuan You · Xin Cai · Jinjin Gu · Tianfan Xue · Chao Dong
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang · Rishabh Dabral · Diogo Luvizon · Zhe Cao · Lingjie Liu · Thabo Beeler · Christian Theobalt
FIFA: Fine-grained Inter-frame Attention for Driver’s Video Gaze Estimation
Daosong Hu · Mingyue Cui · Kai Huang
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen · Juze Zhang · Shrinidhi Kowshika Lakshmikanth · Yusu Fang · Ruizhi Shao · Gordon Wetzstein · Li Fei-Fei · Ehsan Adeli
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance
Jin-Liang Xiao · Ting-Zhu Huang · Liang-Jian Deng · Guang Lin · Zihan Cao · Chao Li · Qibin Zhao
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee · Joonmyung Choi · Hyunwoo J. Kim
Consistency Posterior Sampling for Diverse Image Synthesis
Vishal Purohit · Matthew Repasky · Jianfeng Lu · Qiang Qiu · Yao Xie · Xiuyuan Cheng
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang · Lin Zhu · Zewen Sun · Hengyu Liu · Qinying Gu · Nanyang Ye
Understanding multi-layered transmission matrices
Marina Alterman · Anat Levin
MambaIRv2: Attentive State Space Restoration
Hang Guo · Yong Guo · Yaohua Zha · Yulun Zhang · Wenbo Li · Tao Dai · Shu-Tao Xia · Yawei Li
Taming Large Multimodal Agents for Ultra-low Bitrate Semantically Disentangled Image Compression
Juan Song · Lijie Yang · Mingtao Feng
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
Kevin Zhang · Jia-Bin Huang · Jose Echevarria · Stephen DiVerdi · Aaron Hertzmann
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng · Haoyu Zhang · Meng Liu · Weili Guan · Liqiang Nie
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Jihoon Kim · Jeongsoo Choi · Jaehun Kim · Chaeyoung Jung · Joon Chung
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang · Enna Sachdeva · Piyush Gupta · Sangjae Bae · Kwonjoon Lee
Efficient Diffusion as Low Light Enhancer
Guanzhou Lan · Qianli Ma · YUQI YANG · Zhigang Wang · Dong Wang · Xuelong Li · Bin Zhao
Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features
Wenhuan Huang · Yi JI · guiqian zhu · Ying Li · chunping Liu
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank · Jim Davis
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu · Lingdong Kong · Tianxin Huang · Gim Hee Lee
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
Amaya Gallagher-Syed · Henry Senior · Omnia Alwazzan · Elena Pontarini · Michele Bombardieri · Costantino Pitzalis · Myles J. Lewis · Michael R Barnes · Luca Rossi · Greg Slabaugh
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
Dongki Jung · Jaehoon Choi · Yonghan Lee · Somi Jeong · Taejae Lee · Dinesh Manocha · Suyong Yeon
CleanDIFT: Diffusion Features without Noise
Nick Stracke · Stefan Andreas Baumann · Kolja Bauer · Frank Fundel · Björn Ommer
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang · Anh Tien Nguyen · Xi Han · Vincent Quoc-Huy Trinh · Hong Qin · Dimitris Samaras · Mahdi Hosseini
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang · Peng Zhang · Donglin Yang · Yuan Tian · Hai Lin · Xiaosong Wang
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
Haiyi Qiu · Minghe Gao · Long Qian · Kaihang Pan · Qifan Yu · Juncheng Li · Wenjie Wang · Siliang Tang · Yueting Zhuang · Tat-seng Chua
Incomplete Multi-View Multi-label Learning via Disentangled Representation and Label Semantic Embedding
Xu Yan · Jun Yin · Jie Wen
A Selective Re-learning Mechanism for Hyperspectral Fusion Imaging
Yuanye Liu · jinyang liu · Renwei Dian · Shutao Li
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang · Yixiao Yang · Han Huang · Liang Han · Kanle Shi · Yu-Shen Liu · Zhizhong Han
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
Bangbang Zhou · Zuan Gao · Zixiao Wang · Boqiang Zhang · Yuxin Wang · Zhineng Chen · Hongtao Xie
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang · Yutao Hu · Wenqi Shao · Ying Fu
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu chen · Zhen Wang · Runshi Li · Bowei Zhu · Long Chen
Improving the Training of Data Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling
Zhaoyu Zhang · Yang Hua · Guanxiong Sun · Hui Wang · Seán F. McLoone
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng · Yuhang Yang · Wen Li · Lixin Duan
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot · Shie Mannor · Assaf Shocher · Gal Chechik · Assaf Hallak
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu · Yan Huang · Min Gao · Yuzhen Niu · Yuzhong Chen · Qiang Wu
Latent Space Imaging
Matheus Souza · Yidan Zheng · Kaizhang Kang · Yogeshwar Nath Mishra · Qiang Fu · Wolfgang Heidrich
Understanding Multi-Task Activities from Single-Task Videos
Yuhan Shen · Ehsan Elhamifar
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang · Xiaolong Li · Fei Gao · Jie Guo · Xinbo Gao · Jing Zhang
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
Emanuele Aiello · Umberto Michieli · Diego Valsesia · Mete Ozay · Enrico Magli
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia · Jan Warchocki · Jan van Gemert · Daan Brinks · Nergis Tomen
Uncertain Multimodal Intention and Emotion Understanding in the Wild
Qu Yang · QingHongYa Shi · Tongxin Wang · Mang Ye
Arc2Avatar: Generating Expressive 3D Avatars from a single image via ID Guidance
Dimitrios Gerogiannis · Foivos Paraperas Papantoniou · Rolandos Alexandros Potamias · Alexandros Lattas · Stefanos Zafeiriou
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
Jack Li · Tianchen Zhao · Xiang Xu · Zheng Zhang · Zhihua Li · Xuanbai Chen · Qin ZHANG · Alessandro Bergamo · Anil Kumar Jain · Yifan Xing
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
Xianwei Zhuang · Zhihong Zhu · Yuxin Xie · Liming Liang · Yuexian Zou
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation
Feng Yu · Jiacheng Cao · Li Liu · Minghua Jiang
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko · Ayana Bharadwaj · Zhi Yang Lim · Vasily Ilin · Ben Caffee · Dongping Chen · Reza Salehi · Cheng-Yu Hsieh · Ranjay Krishna
SACB-Net: Spatial-awareness Convolutions for Medical Image Registration
Xinxing Cheng · Tianyang Zhang · Wenqi Lu · Qingjie Meng · Alejandro F Frangi · Jinming Duan
VidComposition: Can MLLMs Analyze Compositions in Compiled Video?
Yunlong Tang · JunJia Guo · Hang Hua · Susan Liang · Mingqian Feng · Xinyang Li · Rui Mao · Chao Huang · Jing Bi · Zeliang Zhang · Pooyan Fazli · Chenliang Xu
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng · Jiehua Zhang · Xichun Sheng · Chenggang Yan · Yaoqi Sun · Ying Fu · Liang Li
Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov · Denis Kuznedelev · Mikhail Khoroshikh · Valentin Khrulkov · Dmitry Baranchuk
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection
Ji Du · Fangwei Hao · Mingyang Yu · Desheng Kong · Jiesheng Wu · Bin Wang · Jing XU · Ping Li
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
Longquan Dai · He Wang · Jinhui Tang
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
Songpengcheng Xia · Yu Zhang · Zhuo Su · Xiaozheng Zheng · Zheng Lv · Guidong Wang · Yongjie Zhang · Qi Wu · Lei Chu · Ling Pei
Yo’Chameleon: Personalized Vision and Language Generation
Thao Nguyen · Krishna Kumar Singh · Jing Shi · Trung Bui · Yong Jae Lee · Yuheng Li
GenFusion: Closing the loop between Reconstruction and Generation via Videos
Sibo Wu · Congrong Xu · Binbin Huang · Andreas Geiger · Anpei Chen
3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping
Chenhui Shi · Fulin Tang · Ning An · Yihong Wu
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu · Xiaofei Wei · Zixuan Zeng · Yuxuan Yao · Li Zhang
Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Xinyu Gao · Ziyi Yang · Bingchen Gong · Xiaoguang Han · Sipeng Yang · Xiaogang Jin
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Siyan Dong · Shuzhe Wang · Shaohui Liu · Lulu Cai · Qingnan Fan · Juho Kannala · Yanchao Yang
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen · Mohamed Elhoseiny
Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians
Changfeng Ma · Ran Bi · Jie Guo · Chongjun Wang · Yanwen Guo
Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu · Zhixing Zhang · Yanyu Li · Yanwu Xu · Anil Kag · Yang Sui · Huseyin Coskun · Ke Ma · Aleksei Lebedev · Ju Hu · Dimitris N. Metaxas · Yanzhi Wang · Sergey Tulyakov · Jian Ren
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu · Weicai Ye · Yan Luximon · Pengfei Wan · Di ZHANG
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
Shun Iwase · Zubair Irshad · Katherine Liu · Vitor Guizilini · Robert Lee · Takuya Ikeda · Ayako Amma · Koichi Nishiwaki · Kris Kitani · Rares Andrei Ambrus · Sergey Zakharov
Explain in Diffusion: Explaining a Classifier with Diffusion Semantics
Tahira Kazimi · Ritika Allada · Pinar Yanardag
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Peiqing Yang · Shangchen Zhou · Jixin Zhao · Qingyi Tao · Chen Change Loy
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu · Hongbo Fu · Xintao Wang · Weicai Ye · Pengfei Wan · Di ZHANG · Lin Gao
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients
Li Lun · Kunyu Feng · Qinglong Ni · Ling Liang · Yuan Wang · Ying Li · dunshan yu · Xiaoxin CUI
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
Yanfeng Zhou · Lingrui Li · Le Lu · Minfeng Xu
Associative Transformer
Yuwei Sun · Hideya Ochiai · Zhirong Wu · Stephen Lin · Ryota Kanai
Generative Photomontage
Sean J. Liu · Nupur Kumari · Ariel Shamir · Jun-Yan Zhu
LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning
Peng Wu · Xiankai Lu · Hao Hu · Yongqin Xian · Jianbing Shen · Wenguan Wang
Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment
Weiming Liu · Jun Dan · Fan Wang · Xinting Liao · Junhao Dong · Hua Yu · Shunjie Dong · Lianyong Qi
Dense-SfM: Structure from Motion with Dense Consistent Matching
JongMin Lee · Sungjoo Yoo
Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics
Yair Smadar · Assaf Hoogi
4Real-Video: Learning Generalizable Photo-realistic 4D Video Diffusion
Chaoyang Wang · Peiye Zhuang · Tuan Duc Ngo · Willi Menapace · Aliaksandr Siarohin · Michael Vasilkovsky · Ivan Skorokhodov · Sergey Tulyakov · Peter Wonka · Hsin-Ying Lee
Towards Cost-Effective Learning: A Synergy of Semi-Supervised and Active Learning
Tianxiang Yin · Ningzhong Liu · Han Sun
Monocular Depth Priors for Robust Structure-from-Motion
Zador Pataki · Paul-Edouard Sarlin · Johannes Schönberger · Marc Pollefeys
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
Henrique Morimitsu · Xiaobin Zhu · Roberto M. Cesar Jr · Xiangyang Ji · Xu-Cheng Yin
FlexUOD: The Answer to Real-world Unsupervised Image Outlier Detection
Zhonghang Liu · Kun Zhou · Changshuo Wang · Daniel Lin · Jiangbo Lu
Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
Liyan Chen · Gregory P. Meyer · Zaiwei Zhang · Eric M. Wolff · Paul Vernaza
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry · Jacob Krantz · Stefan Lee
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang · Binxiao Huang · Yulun Zhang · Dahai Yu · Yujiu Yang · Ngai Wong
Deterministic Image-to-Image Translations via Brownian Bridge Denoising Models with Dual Approximators
Bohan Xiao · PEIYONG WANG · Qisheng He · Ming Dong
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
Ren Wang · Haoliang Sun · Yuxiu Lin · Chuanhui Zuo · Yongshun Gong · Yilong Yin · Wenjia Meng
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa · Kai Wang · C.Alejandro Parraga · Bartłomiej Twardowski · Jesus Malo · Javier Vazquez-Corral · Joost van de Weijer
Image Quality Assessment: From Human to Machine Preference
Chunyi Li · Yuan Tian · Xiaoyue Ling · Zicheng Zhang · Haodong Duan · Haoning Wu · Ziheng Jia · Xiaohong Liu · Xiongkuo Min · Guo Lu · Weisi Lin · Guangtao Zhai
Practical solutions to the relative pose of three calibrated cameras
Charalambos Tzamos · Viktor Kocur · Yaqing Ding · Daniel Barath · Zuzana Berger Haladova · Torsten Sattler · Zuzana Kukelova
TailedCore : Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Yoon Gyo Jung · Jaewoo Park · Jaeho Yoon · Kuan-Chuan Peng · Wonchul Kim · Andrew Beng Jin Teoh · Octavia Camps
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement
Xinjie Li · Ziyi Chen · Xinlu Yu · Iek-Heng Chu · Peng Chang · Jing Xiao
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
Yitang Li · Mingxian Lin · Zhuo Lin · Yipeng Deng · Yue Cao · Li Yi
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffuson Models
Junhyuk So · Jiwoong Shin · Chaeyeon Jang · Eunhyeok Park
Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent
Philip Doldo · Derek Everett · Amol Khanna · Andre T Nguyen · Edward Raff
The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation
Marcus Nordström · Atsuto Maki · Henrik Hult
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang · Gang Wang · Huan Zhang
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
Ziliang Chen · Xin Huang · Xiaoxuan Fan · Keze Wang · Yuyu Zhou · Quanlong Guan · Liang Lin
Adapting to Observation Length of Trajectory Prediction via Contrastive Learning
Ruiqi Qiu · JUN GONG · Xinyu Zhang · Siqi Luo · Bowen Zhang · Yi Cen
Boost the Inference with Co-training: A Depth-guided Mutual Learning Framework for Semi-supervised Medical Polyp Segmentation
Yuxin Li · Zihao Zhu · Yuxiang Zhang · Yifan Chen · Zhibin Yu
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
Jiaxin Zhang · Junjun Jiang · Youyu Chen · Kui Jiang · Xianming Liu
Language-Guided Image Tokenization for Generation
Kaiwen Zha · Lijun Yu · Alireza Fathi · David A. Ross · Cordelia Schmid · Dina Katabi · Xiuye Gu
DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image
Ziwei Zhao · Zhixing Zhang · Yuhang Liu · Zhao Zhang · Haojun Yu · Dong Wang · Liwei Wang
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang · Guoyu Lu
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
Youyu Chen · Junjun Jiang · Kui Jiang · Xiao Tang · Zhihao Li · Xianming Liu · Yinyu Nie
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir · Zongwei Wu · Nancy Mehta · Yuedong Tan · Danda Paudel · Yulun Zhang · Radu Timofte
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection
Ting Li · Mao Ye · Tianwen Wu · Nianxin Li · Shuaifeng Li · Song Tang · Luping Ji
A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering
Zheming Xu · He Liu · Congyan Lang · Tao Wang · Yidong Li · Michael C. Kampffmeyer
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel · Levon Khachatryan · Hayk Poghosyan · Daniil Hayrapetyan · Vahram Tadevosyan · Zhangyang Wang · Shant Navasardyan · Humphrey Shi
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
Guillem Font Font · Antonio Rubio · Luis Ferraz · Antonio Agudo
MAGE : Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model
Haoyuan Wang · Zhenwei Wang · Xiaoxiao Long · Cheng Lin · Gerhard Hancke · Rynson W.H. Lau
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation
Qi Zhang · Jibin Peng · Zhao Huang · Wei Feng · Di Lin
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho · Samuel Madden
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu · Hetarth Chopra · Ansel Blume · Jeonghwan Kim · Revanth Gangi Reddy · Heng Ji
RICCARDO: Radar Hits Prediction and Convolution for Target Detection with Radar-Camera Fusion
Yunfei Long · Abhinav Kumar · Xiaoming Liu · Daniel Morris
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani · Savas Ozkan · Sijun Cho · Hyo-Won Kim · Eunchung Noh · Jeongwon Min · Cho Jung Min · Dookun Park · Mete Ozay
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging
Xianrui Li · Yufei Cui · Jun Li · Antoni B. Chan
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
Yoojin Jung · Byung Cheol Song
Exploiting Deblurring Networks for Radiance Fields
Haeyun Choi · Heemin Yang · Janghyeok Han · Sunghyun Cho
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
Mingzhen Huang · Fu-Jen Chu · Bugra Tekin · Kevin Liang · Haoyu Ma · Weiyao Wang · Xingyu Chen · Pierre Gleize · Hongfei Xue · Siwei Lyu · Kris Kitani · Matt Feiszli · Hao Tang
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo · Andrew Rouditchenko · Yuan Gong · Saurabhchand Bhati · Samuel Thomas · Brian Kingsbury · Leonid Karlinsky · Rogerio Feris · James Glass · Hilde Kuehne
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
Peng Li · Wangguandong Zheng · Yuan Liu · Tao Yu · Yangguang Li · Xingqun Qi · Xiaowei Chi · Siyu Xia · Yan-Pei Cao · Wei Xue · Wenhan Luo · Yike Guo
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Yihao Wang · Marcus Klasson · Matias Turkulainen · Shuzhe Wang · Juho Kannala · Arno Solin
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng · Xinyi Wu · Yongxing Yang · Congchao Zhu · Song Wang · Zhenyao Wu
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation
Shuling Zhao · Fa-Ting Hong · Xiaoshui Huang · Dan Xu
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding
Jiaxin Shi · Mingyue Xiang · Hao Sun · Yixuan Huang · Zhi Weng
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu · Jiayi Zhang · Jiaxin Chen · Jinyang Guo · Di Huang · Yunhong Wang
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting
Jiajun Tang · Fan Fei · Zhihao Li · Xiao Tang · Shiyong Liu · Youyu Chen · Binxiao Huang · Dave Zhenyu Chen · Xiaofei Wu · Boxin Shi
Memories of Forgotten Concepts
Matan Rusanovsky · Shimon Malnick · Amir Jevnisek · Ohad Fried · Shai Avidan
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang · Yuchang Su · Yiming Liu · Xiaohan Wang · James Burgess · Elaine Sui · Chenyu Wang · Josiah Aklilu · Alejandro Lozano · Anjiang Wei · Ludwig Schmidt · Serena Yeung
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An · Jin Hyeon Kim · Seonghoon Park · Sunghwan Hong · Jaewoo Jung · Jisang Han · Seungryong Kim
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions
Shihang Du · Sanqing Qu · Tianhang Wang · Xudong Zhang · Yunwei Zhu · Jian Mao · Fan Lu · Qiao Lin · Guang Chen
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Yunlong Lin · Zixu Lin · Haoyu Chen · Panwang Pan · Chenxin Li · Sixiang Chen · Kairun Wen · Yeying Jin · Wenbo Li · Xinghao Ding
Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection
Zhuo Xu · Xiang Xiang · Yifan Liang
CARL: A Framework for Equivariant Image Registration
Hastings Greer · Lin Tian · François-Xavier Vialard · Roland Kwitt · Raúl San José Estépar · Marc Niethammer
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu · Qi Liu · Xinchen Liu · Jie Li · Yongdong Zhang · Jiebo Luo · Xiaodong He · Wu Liu
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang · Xiaoming Liu · Yu Kong
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
Rong Wang · Fabian Prada · Ziyan Wang · Zhongshi Jiang · Chengxiang Yin · Junxuan Li · Shunsuke Saito · Igor Santesteban · Javier Romero · Rohan Joshi · Hongdong Li · Jason Saragih · Yaser Sheikh
Spatial-Temporal Visual Representation for Self-Supervised Motion Planning
Yichen Xie · Runsheng Xu · Tong He · Jyh-Jing Hwang · Katie Z Luo · Jingwei Ji · Hubert Lin · Letian Chen · Yiren Lu · Zhaoqi Leng · Dragomir Anguelov · Mingxing Tan
MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction
Xiaohao Xu · Feng Xue · Shibo Zhao · Yike Pan · Sebastian Scherer · Xiaonan Huang
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang · Jiayi He · Lechao Cheng · Jingjing Wu · Dan Guo · Richang Hong
MaRI: Material Retrieval Integration across Domains
Jianhui Wang · Zhifei Yang · Yangfan He · Huixiong Zhang · Yuxuan Chen · Jingwei Huang
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views
Ningli Xu · Rongjun Qin
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu · Lingdong Kong · hui shuai · Liang Pan · Ziwei Liu · Qingshan Liu
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Martin Spitznagel · Jan Vaillant · Janis Keuper
PhaseScene : Dynamic Scene Generation with Phase-Specific Action Modeling for Embodied AI
Sangmin Lee · Sungyong Park · Heewon Kim
CGMatch: A Different Perspective of Semi-supervised Learning
Bo Cheng · Jueqing Lu · Yuan Tian · Haifeng Zhao · Yi Chang · Lan Du
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
Jeongryong Lee · Yejee Shin · Geonhui Son · Dosik Hwang
FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
Jiawei Zhang · Zijian Wu · Zhiyang Liang · Yicheng Gong · Dongfang Hu · Yao Yao · Xun Cao · Hao Zhu
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Sayak Nag · Udita Ghosh · Calvin-Khang Ta · Sarosij Bose · Jiachen Li · Amit K. Roy-Chowdhury
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing
Jiayi Fu · Siyu Liu · Zikun Liu · Chun-Le Guo · Hyunhee Park · Rui-Qi Wu · Guoqing Wang · Chongyi Li
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei · Vishal M. Patel · Mojtaba Sahraee-Ardakan · Hossein Talebi · Peyman Milanfar · Mauricio Delbracio
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang · Wenqi Jia · Wei Niu · Miao Yin
LUCAS: Layered Universal Codec Avatars
Di Liu · Teng Deng · Giljoo Nam · Yu Rong · Stanislav Pidhorskyi · Junxuan Li · Jason Saragih · Dimitris N. Metaxas · Chen Cao
NADER: Neural Architecture Design via Multi-Agent Collaboration
Zekang Yang · Wang ZENG · Sheng Jin · Chen Qian · Ping Luo · Wentao Liu
HORP: Human-Object Relation Priors Guided HOI Detection
Pei Geng · Jian Yang · Shanshan Zhang
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang · Zizhang Li · Matt Zhou · Shangzhe Wu · Jiajun Wu
Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models
Jianlong Jin · Chenglong Zhao · Ruixin Zhang · Sheng Shang · Jianqing Xu · Jingyun Zhang · ShaoMing Wang · Yang Zhao · Shouhong Ding · Wei Jia · Yunsheng Wu
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery · Jonathan Hayase · Pang Wei Koh · Sewoong Oh
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang · Xuan Dong · Xizhou Zhu · Weijie Su · Jiahao Wang · Hao Tian · Zhe Chen · Wenhai Wang · Lewei Lu · Jifeng Dai
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara · Heng Zhao · Demetrio Labate · Vishal M. Patel
ControlFace: Harnessing Facial Parametric Control for Face Rigging
Wooseok Jang · Youngjun Hong · Geonho Cha · Seungryong Kim
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim · Siwon Kim · Minjun Park · Rahim Entezari · Sungroh Yoon
Early-Bird DiffusionEarly-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training
Lexington Whalen · Zhenbang Du · Haoran You · Chaojian Li · Sixu Li · Yingyan (Celine) Lin
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang · Yuxiao Wu · Weiye Xu · Yating Huang · Chao Zhang · Zongpeng Li · Mingwei Xu · Zhenkai Liang
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park · Minh-Quan Viet Bui · Juan Luis Gonzalez Bello · Jaeho Moon · Jihyong Oh · Munchurl Kim
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song · Daemin Park · Kanghyun Baek · Sangyub Lee · Jooyoung Choi · Eunji Kim · Sungroh Yoon
Convex Combination Star Shape Prior for Data-driven Image Semantic Segmentation
Xinyu Zhao · Jun Xie · Shengzhe Chen · Jun Liu
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection
Xinjie Cui · Yuezun Li · Ao Luo · Jiaran Zhou · Junyu Dong
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR
Xugong Qin · peng zhang · Jun Jie Ou Yang · Gangyan Zeng · Yubo Li · Yuanyuan Wang · Wanqian Zhang · Pengwen Dai
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Shijia Zhao · Qiming Xia · Xusheng Guo · Pufan Zou · Maoji Zheng · Hai Wu · Chenglu Wen · Cheng Wang
Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
Shuaiwei Yuan · Junyu Dong · Yuezun Li
Linear Attention Modeling for Learned Image Compression
Donghui Feng · Zhengxue Cheng · Shen Wang · Ronghua Wu · Hongwei Hu · Guo Lu · Li Song
HotSpot: Screened Poisson Equation for Signed Distance Function Optimization
Zimo Wang · Cheng Wang · Taiki Yoshino · Sirui Tao · Ziyang Fu · Tzu-Mao Li
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang · Zhiyu Pan · Zhan Peng · Jian Cheng · Liwen Xiao · Wei Jiang · Zhiguo Cao
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Aleksei Bokhovkin · Quan Meng · Shubham Tulsiani · Angela Dai
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu · ASM Iftekhar · Gaurav Mittal · Tianjian Meng · Xiawei Wang · Cheng Zhao · Rohith Kukkala · Ehsan Elhamifar · Mei Chen
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye · Weiyang Liu · Pan He
SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Redefinition and Split Optimization
Junchen Yu · Si-Yuan Cao · Runmin Zhang · Chenghao Zhang · Zhu Yu · Shujie Chen · Bailin Yang · Hui-Liang Shen
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan · Guiwei Zhang · Changxiao Ma · Tianyi Zhang · Guanglin Niu
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin · Jooyoung Choi · Heeseung Kim · Sungroh Yoon
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
Sijie Wang · Rui She · Qiyu Kang · Siqi Li · Disheng Li · Tianyu Geng · Shangshu Yu · Wee Peng Tay
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao · Nikhil Parthasarathy · Muhammad Ferjad Naeem · Talfan Evans · Samuel Albanie · Federico Tombari · Yongqin Xian · Alessio Tonioni · Olivier J Henaff
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang · Chenghui Lv · Xian Li · Shichao Dong · Huadong Li · kelu Yao · Chao Li · Wenqi Shao · Ping Luo
KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities
Tianyi Liu · Haochuan Jiang · Kaizhu Huang
NGV: Neural Gaussian Velocity for 3D Physics Modeling from Dynamic Videos
Jinxi Li · Ziyang Song · Siyuan Zhou · Bo Yang
OSDFace: One-Step Diffusion Model for Face Restoration
Jingkai Wang · Jue Gong · Lin Zhang · Zheng Chen · Xing Liu · Hong Gu · Yutong Liu · Yulun Zhang · Xiaokang Yang
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis
Jeonghwan Park · Niall McLaughlin · Ihsen Alouani
UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning
Weiqi Yan · Lvhai Chen · Huaijia Kou · Shengchuan Zhang · Yan Zhang · Liujuan Cao
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak · Joonsoo Kim · Jun Young Jeong · Won-Sik Cheong · Jihyong Oh · Munchurl Kim
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Chaojun Ni · Guosheng Zhao · Xiaofeng Wang · Zheng Zhu · Wenkang Qin · Guan Huang · Chen Liu · Yuyin Chen · Yida Wang · Xueyang Zhang · Yifei Zhan · Kun Zhan · Peng Jia · XianPeng Lang · Xingang Wang · Wenjun Mei
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia · Wenbo Gou · Haoye Dong
Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception
Luke Chen · Junyao Wang · Trier Mortlock · Pramod Khargonekar · Mohammad Al Faruque
Identifying and Mitigating Spurious Correlation in Multi-Task Learning
Junyi Chai · Shenyu Lu · Xiaoqian Wang
Generative Modeling of Class Probability for Multi Modal Representation Learning
JungKyoo Shin · Bumsoo Kim · Eunwoo Kim
Flexible Group Count Enables Hassle-Free Structured Pruning
Jiamu Zhang · Shaochen (Henry) Zhong · Andrew Ye · Zirui Liu · Sebastian Zhao · Kaixiong Zhou · Li Li · Soo-Hyun Choi · Rui Chen · Xia Hu · Shuai Xu · Vipin Chaudhary
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin · Yoad Tewel · Hilit Segev · Eran Hirsch · Royi Rassin · Gal Chechik
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li · Luyuan Zhang · Zedong Wang · Juanxi Tian · Cheng Tan · Zicheng Liu · Chang Yu · Qingsong Xie · Haonan Lu · Haoqian Wang · Zhen Lei
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen · Weilong Chen · Yifan Zuo · Yuming Fang
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong · Junjun Jiang · Kui Jiang · Jiahan Li · Yongbing Zhang
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao · Pengtao Chen · Chong Yu · Yan Wen · Xudong Tan · Tao Chen
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen · Zelin Peng · Yu Huang · Xiaokang Yang · Wei Shen
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong · Hao Fang · Bin Chen · Xulin Gu · Meikang Qiu · Shuhan Qi · Shu-Tao Xia
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion
Muchen Li · Sammy Christen · Chengde Wan · Yujun Cai · Renjie Liao · Leonid Sigal · Shugao Ma
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
Daisheng Jin · Jiangbei Hu · Baixin Xu · Yuxin Dai · Chen Qian · Ying He
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
Huakai Lai · Guoxin Xiong · Huayu Mai · Xiang Liu · Tianzhu Zhang
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
Max Gutbrod · David Rauber · Danilo Weber Nunes · Christoph Palm
ArtiFade: Learning to Generate High-quality Subject from Blemished Image
Shuya Yang · Shaozhe Hao · Yukang Cao · Kwan-Yee K. Wong
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
Dileepa Pitawela · Gustavo Carneiro · Hsiang-Ting Chen
Separation of powers: On segregating knowledge from observation in LLM-enabled knowledge-based visual question answering
Zhen Yang · Zhuo Tao · Qi Chen · Yuankai Qi · Liang Li · Anton van den Hengel · Qingming Huang
Correlative and Discriminative Label Grouping for Multi-Label VPT
Leilei Ma · Shuo Xu · Ming-Kun Xie · Lei Wang · Dengdi Sun · Haifeng Zhao
Training-free Video Semantic Segmentation based on Diffusion Models
Qian Wang · Abdelrahman Eldesokey · Mohit Mendiratta · Fangneng Zhan · Adam Kortylewski · Christian Theobalt · Peter Wonka
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin · Xin Yang · Meixi Chen · Xu Yingjie · Dongyu Yan · Leyi Wu · Xinli Xu · Lie XU · Shunsi Zhang · Ying-Cong Chen
Star with Bilinear Mapping
Zelin Peng · Yu Huang · Zhengqin Xu · feilong tang · Ming Hu · Xiaokang Yang · Wei Shen
MET3R: Measuring Multi-View Consistency in Generated Images
Mohammad Asim · Christopher Wewer · Thomas Wimmer · Bernt Schiele · Jan Lenssen
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
Minhyeok Lee · Suhwan Cho · Jungho Lee · Sunghun Yang · Heeseung Choi · Ig-Jae Kim · Sangyoun Lee
Doppelgängers and Adversarial Vulnerability
George Kamberov
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng · Zhengqin Xu · Zhilin Zeng · Changsong Wen · Yu Huang · Menglin Yang · feilong tang · Wei Shen
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang · Tongkun Guan · Pei Fu · Chen Duan · Qianyi Jiang · Zhentao Guo · Shan Guo · Junfeng Luo · Wei Shen · Xiaokang Yang
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
Long Ma · Tengyu Ma · Ziye Li · Yuetong Wang · Jinyuan Liu · Chengpei Xu · Risheng Liu
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
Siyi Du · Xinzhe Luo · Declan ORegan · Chen Qin
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman · Xiaoman Zhang · Emma Chen · Sung Eun Kim · Pranav Rajpurkar
Multi-View Pose-Agnostic Change Localization with Zero Labels
Chamuditha Jayanga Galappaththige · Jason Lai · Lloyd Windrim · Donald G. Dansereau · Niko Suenderhauf · Dimity Miller
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
Xinrui Wang · Lanqing Guo · Xiyu Wang · Siyu Huang · Bihan Wen
GeoDepth: From Point-to-Depth to Plane-to-Depth Modeling for Self-Supervised Monocular Depth Estimation
Haifeng Wu · Shuhang Gu · Lixin Duan · Wen Li
Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework
Hanrui Zhao · Niuniu Qi · Mengxin Ren · Banglong Liu · Shuming Shi · Zhengfeng Yang
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang · Yu Jin · Wentao Wu · Wei Zhang · Lin Zhu · Bo Jiang · Yonghong Tian
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation
Jiawei Fu · ZHANG Tiantian · Kai Chen · Qi Dou
Navigation World Models
Amir Bar · Gaoyue Zhou · Danny Tran · Trevor Darrell · Yann LeCun
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun · Barath Lakshmanan · Maying Shen · Shiyi Lan · Jingde Chen · Jose M. Alvarez
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar · Gursimran Singh · Mohammad Akbari · Yong Zhang
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui · Lingzhi Chen · Zhenyu Tang · Lilong Wang · Mianxin Liu · Shaoting Zhang · Xiaosong Wang
Neural Inverse Rendering from Propagating Light
Anagh Malik · Benjamin Attal · Andrew Xie · Matthew O’Toole · David B. Lindell
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
Jinseong Jang · Chunfei Ma · Byeongwon Lee
Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation
Yue Zhang · Mingyue Bin · Yuyang Zhang · Zhongyuan Wang · Zhen Han · Chao Liang
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining
Tong Wang · Mingkang Wang · Zhongze Wang · Hongkai Wang · Qi Xu · Fengyu Cong · Hongming Xu
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization
Sihao Liu · Yibo Yang · Xiaojie Li · David A. Clifton · Bernard Ghanem
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo · Conghao Xiong · Jiabo MA · Qichen Sun · Lishuang Feng · Jinzhuo Wang · Hao Chen
UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units
Huakun Liu · Hiroki Ota · Xin Wei · Yutaro Hirao · Monica Perusquia-Hernandez · Hideaki Uchiyama · Kiyoshi Kiyokawa
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
Xuecheng Wu · Heli Sun · Yifan Wang · Jiayu Nie · Jie Zhang · Yabing Wang · Junxiao Xue · Liang He
M33-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Zixuan Chen · Jiaxin Li · Junxuan Liang · Liming Tan · Yejie Guo · Cewu Lu · Yonglu Li
DV-Matcher: Deformation-based Non-rigid Point Cloud Matching Guided by Pre-trained Visual Features
Zhangquan Chen · Puhua Jiang · Ruqi Huang
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models
Kamil Garifullin · Maxim Nikolaev · Andrey Kuznetsov · Aibek Alanov
Homogeneous Dynamics Space for Heterogeneous Humans
Xinpeng Liu · Junxuan Liang · Chenshuo Zhang · Zixuan Cai · Cewu Lu · Yonglu Li
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu · Yuan Meng · Jiayi Ji · Xiaoshuai Sun
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Yukang Lin · Hokit Fung · Jianjin Xu · Zeping Ren · Adela S.M. Lau · Guosheng Yin · Xiu Li
Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval
Siyuan Duan · Yuan Sun · Dezhong Peng · Zheng Liu · Xiaomin Song · Peng Hu
Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
Dong Li · Wenqi Zhong · Wei Yu · Yingwei Pan · Dingwen Zhang · Ting Yao · Junwei Han · Tao Mei
Interleaved-modal Chain-of-Thought
Jun Gao · Yongqi Li · Ziqiang Cao · Wenjie Li
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen · Yunhao Gou · Runhui Huang · Zhili Liu · Daxin Tan · Jing Xu · Chunwei Wang · Yi Zhu · yihan zeng · Kuo Yang · Dingdong WANG · Kun Xiang · Haoyuan Li · Haoli Bai · Jianhua Han · Xiao-Hui Li · Weike Jin · Nian Xie · Yu Zhang · James Kwok · Hengshuang Zhao · Xiaodan Liang · Dit-Yan Yeung · Xiao Chen · Zhenguo Li · Wei Zhang · Qun Liu · Lanqing Hong · Lu Hou · Hang Xu
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn · KiYoon Yoo · Wonhyuk Ahn · Daesik Kim · Seung-Hun Nam
Learning Affine Correspondences by Integrating Geometric Constraints
Pengju Sun · Banglei Guan · Zhenbao Yu · Yang Shang · Qifeng Yu · Daniel Barath
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen · Tianshu Zhang · Shiyu Huang · Yuwei Niu · Linfeng Zhang · Lijie Wen · Xuming Hu
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification
wenlong yu · Qilong Wang · Chuang Liu · Dong Li · Qinghua Hu
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
Shu Yang · Chengting Yu · Lei Liu · Hanzhi Ma · Aili Wang · Erping Li
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang · Srishti Yadav · Fengze Han · Ekaterina Shutova
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu · Yiming Zhao · Zhicong Tang · Ruihong Yin · Haoxing Ye · Yuhui Yuan · Dong Chen · Jianmin Bao · Sirui Zhang · Yanbin Wang · Lin Liang · Lijuan Wang · Ji Li · Xiu Li · Zhouhui Lian · Gao Huang · Baining Guo
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing · Pranavi Kolouju · Robert Pless · Abby Stylianou · Nathan Jacobs
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
Xianing Chen · Si Huo · Borui Jiang · Hailin Hu · Xinghao Chen
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal · Peiye Zhuang · Hsin-Ying Lee · Shubham Tulsiani
Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai · Ping Zhang · Cheng-Hao Tu · Hong-You Chen · Quang-Huy Nguyen · Li Zhang · Wei-Lun Chao
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud · Shaden Alshammari · Yonglong Tian · Guohao Li · Philip H.S. Torr · Yoon Kim · Marzyeh Ghassemi
Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering
Liang Chen · Zhe Xue · Yawen Li · Meiyu Liang · Yan Wang · Anton van den Hengel · Yuankai Qi
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU · Yinghao Chen · chencheng · Hui Tan · Boheng Chen · yina xie · Rui Hu · Guanxin tan · Renshou Wu · Yan Hu · Yi Zeng · Lei Wu · Liuyang Bian · Zhaoxiong Wang · Long Liu · Yanzhou Yang · Han Xiao · Aojun Zhou · Yafei Wen · Xiaoxin Chen · Shuai Ren · Hongsheng Li
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives
Alex Hanson · Allen Tu · Geng Lin · Vasu Singla · Matthias Zwicker · Tom Goldstein
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition
Fei Xie · Jiahao Nie · Yujin Tang · Wenkang Zhang · Hongshen Zhao
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang · Zhimao Peng · Zhengyuan Xie · Fei Yang · Xialei Liu · Ming-Ming Cheng
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
Wei Luo · Yunkang Cao · Haiming Yao · Xiaotian Zhang · Jianan Lou · Yuqi Cheng · Weiming Shen · Wenyong Yu
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture
Kenkun Liu · Yurong Fu · Weihao Yuan · Jing Lin · Peihao Li · Xiaodong Gu · Lingteng Qiu · Haoqian Wang · Zilong Dong · Xiaoguang Han
Active Event-based Stereo Vision
Jianing Li · Yunjian Zhang · Haiqian Han · Xiangyang Ji
LoTUS: Large-Scale Machine Unelarning with a Taste of Uncertainty
Christoforos N. Spartalis · Theodoros Semertzidis · Efstratios Gavves · Petros Daras
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang · Yunjian Zhang · Yao Zhu · Jianing Li · Zizhe Wang · Yanwei Liu · Xiangyang Ji
Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images
Wensheng Cheng · Zhenghong Li · Jiaxiang Ren · Hyomin Jeong · Congwu Du · Yingtian Pan · Haibin Ling
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu · Jianing Li · guanghui zhao · Yunjian Zhang · Xin Meng · Fei Richard Yu · Xiangyang Ji · Ming Li
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection
Jinhyung Park · Navyata Sanghvi · Hiroki Adachi · Yoshihisa Shibata · Shawn Hunt · Shinya Tanaka · Hironobu Fujiyioshi · Kris Kitani
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen · Jianwei Yang · Haiping Wu · Dianqi Li · Jianfeng Gao · Tianyi Zhou · Bin Xiao
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu · Wei Chow · Zhongqi Yue · Kaihang Pan · Yang Wu · Xiaoyang Wan · Juncheng Li · Siliang Tang · Hanwang Zhang · Yueting Zhuang
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun · Yixuan Si · Chenglu Zhu · Xuan Gong · Kai Zhang · Pingyi Chen · Ye Zhang · Zhongyi Shui · Tao Lin · Lin Yang
Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds
Huitong Chen · Yu Wang · Yan Fan · Guosong Jiang · Qinghua Hu
GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
Mengqiao Han · Liyuan Pan · Xiabi Liu
SfM-Free 3D Gaussian Splatting via Hierarchical Training
Bo Ji · Angela Yao
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek · Jiri Matas
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar · Andrii Zadaianchuk · Rabiul Awal · Maximilian Seitzer · Efstratios Gavves · Aishwarya Agrawal
Multi-modal Medical Diagnosis via Large-small Model Collaboration
Wanyi Chen · Zihua Zhao · Jiangchao Yao · Ya Zhang · Jiajun Bu · Haishuai Wang
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu · Jie Liu · Jie Tang · Gangshan Wu
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu · Shijie Yang · Xin Liu · Jie Liu · Jie Tang · Gangshan Wu
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
Lingjun Mao · Zineng Tang · Alane Suhr
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon · Federico Girella · Ziyue Liu · Marco Cristani · Yiming Wang
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang · Ziming Cheng · Junting Pan · Zhaohui Hou · Mingjie Zhan
Gaussian World Model for Streaming 3D Occupancy Prediction
Sicheng Zuo · Wenzhao Zheng · Yuanhui Huang · Jie Zhou · Jiwen Lu
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan · Yandan Zhao · Shen Chen · Mingyi Guo · Xinghe Fu · Taiping Yao · Shouhong Ding · Yunsheng Wu · Li Yuan
Integral Fast Fourier Color Constancy
Wenjun Wei · Yanlin Qian · Huaian Chen · Junkang Dai · Yi Jin
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan · Md Mohaiminul Islam · Jindong Gu · Thomas Seidl · Gedas Bertasius
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou · Qiang Zou · Shuli Cheng · Jiayi Chen
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George · Karthik Nandan Dasaraju · Rutheesh Reddy Chittepu · Konda Reddy Mopuri
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin · Shizhao Sun · Danqing Huang · Ting Liu · Ji Li · Jiang Bian
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung · Andrew Luo · Gabriel Sarch · Margaret Marie Henderson · Deva Ramanan · Michael J. Tarr
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning
Yanbiao Ma · Wei Dai · Wenke Huang · Jiayi Chen
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo · Xiuwei Xu · Ziwei Wang · Jianjiang Feng · Jie Zhou · Jiwen Lu
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng · Hang Zhou · Jing Liao · Li Cheng · Wenbo Zhou
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin · Xiuwei Xu · Linqing Zhao · Ziwei Wang · Jie Zhou · Jiwen Lu
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu · Haolin Wang · Shilin Ma · Wenliang Zhao · Yansong Tang · Lei Chen · Jie Zhou
VSNet: Focusing on the Linguistic Characteristics of Sign Language
YuHao Li · Xinyue Chen · Hongkai Li · Xiaorong Pu · Peng Jin · Yazhou Ren
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji · Mehdi Noroozi · Victor Escorcia · Anestis Zaganidis · Brais Martinez · Georgios Tzimiropoulos
Progress-Aware Video Frame Captioning
Zihui Xue · Joungbin An · Xitong Yang · Kristen Grauman
Pathways on the Image Manifold: Image Editing via Video Generation
Noam Rotstein · Gal Yona · Daniel Silver · Roy Velich · David Bensaid · Ron Kimmel
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin · Chung-Ho Wu · Changhan Yeh · Shih Han Yen · Cheng Sun · Yu-Lun Liu
DreamOmni: Unified Image Generation and Editing
Bin Xia · Yuechen Zhang · Jingyao Li · Chengyao Wang · Yitong Wang · Xinglong Wu · Bei Yu · Jiaya Jia
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On
Ji Woo Hong · Tri Ton · Trung X. Pham · Gwanhyeong Koo · Sunjae Yoon · Chang D. Yoo
Dual Prompting for Image Restoration across Full-Scene with Diffusion Transformers
Dehong Kong · Fan Li · Zhixin Wang · Jiaqi Xu · Renjing Pei · Wenbo Li · Wenqi Ren
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang · Runze Zhang · Kangjun Liu · Minglun Gong · Hao Zhang · Hui Huang
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab · M. Maruf · Arka Daw · Abhilash Neog · Harish Babu Manogaran · Mridul Khurana · Zhenyang Feng · Bahadir Altintas · Yasin Bakis · Elizabeth Campolongo · Matthew Thompson · Xiaojun Wang · Hilmar Lapp · Tanya Berger-Wolf · Paula Mabee · Henry Bart · Wei-Lun Chao · Wasla Dahdul · Anuj Karpatne
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu · Yuheng Ding · Bingxuan Li · Pan Lu · Da Yin · Kai-Wei Chang · Nanyun Peng
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
Huaize Liu · WenZhang Sun · Donglin Di · Shibo Sun · Jiahui Yang · Hujun Bao · Changqing Zou
ZoomLDM: Latent Diffusion Model for multi-scale image generation
Srikar Yellapragada · Alexandros Graikos · Kostas Triaridis · Prateek Prasanna · Rajarsi Gupta · Joel Saltz · Dimitris Samaras
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan · Kebin wu · Fatima Albreiki
AniMo: Species-aware Model for Text-driven Animal Motion Generation
Xuan Wang · Kai Ruan · Xing Zhang · Gaoang Wang
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren · Zhengyu Zhao · Chenhao Lin · Bo Yang · Lu Zhou · Zhe Liu · Chao Shen
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
Lilin Zhang · Chengpei Wu · Ning Yang
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu · Zilong Huang · Bencheng Liao · Jun Hao Liew · Hanshu Yan · Jiashi Feng · Xinggang Wang
LUMINET: Image-based Indoor Scene Relighting via Latent Intrinsics
Xiaoyan Xing · Konrad Groh · Sezer Karaoglu · Theo Gevers · Anand Bhattad
Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration
Aocheng Li · James R. Zimmer-Dauphinee · Rajesh Kalyanam · Ian Lindsay · Parker VanValkenburgh · Steven Wernke · Daniel Aliaga
Gyro-based Neural Single Image Deblurring
Heemin Yang · Jaesung Rim · Seungyong Lee · Seung-Hwan Baek · Sunghyun Cho
Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves
yan zaoming · pengcheng lei · Tingting Wang · Faming Fang · Junkang Zhang · Yaomin Huang · Haichuan Song
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe · Hanan Gani · Wenqi Zhu · Jiale Cao · Eric P. Xing · Fahad Shahbaz Khan · Salman Khan
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
Yuxuan Wang · Aming Wu · Muli Yang · Yukuan Min · Yihang Zhu · Cheng Deng
Robust Message Embedding via Attention Flow-Based Steganography
Huayuan Ye · Shenzhuo Zhang · Shiqi Jiang · Jing Liao · Shuhang Gu · Dejun Zheng · Changbo Wang · Chenhui Li
Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic
JIANWEI TANG · Hong Yang · Tengyue Chen · Jian-Fang Hu
Robust Multi-Object 4D Generation for In-the-wild Videos
Wen-Hsuan Chu · Lei Ke · Jianmeng Liu · Mingxiao Huo · Pavel Tokmakov · Katerina Fragkiadaki
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu · Ziyue Wang · Can Zhang · Peng Li · Yang Liu
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction
Junjie Zhou · Shouju Wang · Yuxia Tang · Qi Zhu · Daoqiang Zhang · WEI SHAO
Generative Gaussian Splatting for Unbounded 3D City Generation
Haozhe Xie · Zhaoxi Chen · Fangzhou Hong · Ziwei Liu
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao · Junbao Zhuo · Jiansheng Chen · Huimin Ma
Generative Sparse-View Gaussian Splatting
Hanyang Kong · Xingyi Yang · Xinchao Wang
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo · He Zhu · Sida Peng · Haotong Lin · Yunzhi Yan · Tao Xie · Wenguan Wang · Xiaowei Zhou · Hujun Bao
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events
Jesse Hagenaars · Yilun Wu · Federico Paredes Valles · Stein Stroobants · Guido De Croon
Ref-GS: Modeling View-Dependent Appearance with Environment Gaussian
Tao Xie · Xi Chen · Zhen Xu · Yiman Xie · Yudong Jin · Yujun Shen · Sida Peng · Hujun Bao · Xiaowei Zhou
Unified Medical Lesion Segmentation via Self-referring Indicator
Shijie Chang · Xiaoqi Zhao · Lihe Zhang · Tiancheng Wang
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
Zhihao Shi · Dong Huo · Yuhongze Zhou · Yan Min · Juwei Lu · Xinxin Zuo
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo · Juan C. Benito · Alvaro Garcia · Marcos Conde
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li · Bin Chen · Chen Zhao · Lei Zhang · Jian Zhang
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
Chensheng Peng · Chengwei Zhang · Yixiao Wang · Chenfeng Xu · Yichen Xie · Wenzhao Zheng · Kurt Keutzer · Masayoshi Tomizuka · Wei Zhan
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou · Tammy Riklin Raviv
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang · Baopeng Zhang · Zhu Teng · Wenxin Luo · Junnan Zou · Jianping Fan
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
Aaryan Garg · Akash Kumar · Yogesh S. Rawat
Blind-Spot Real-world Image Denoising via Implicit Neural Pixel Resampling
Yuhui Quan · Tianxiang Zheng · Zhiyuan Ma · Hui Ji
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
Yunzhi Yan · Zhen Xu · Haotong Lin · Haian Jin · Haoyu Guo · Yida Wang · Kun Zhan · XianPeng Lang · Hujun Bao · Xiaowei Zhou · Sida Peng
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
Chenyu Zhang · Kunlun Xu · Zichen Liu · Yuxin Peng · Jiahuan Zhou
Test-time augmentation improves efficiency in conformal prediction
Divya M Shanmugam · Helen Lu · Swami Sankaranarayanan · John Guttag
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang · Jiansheng Li · Wenjing Chu · junjia hai · Renzhe Xu · Yuqing Yang · Shikai Guan · Jiazheng Xu · Liping Jing · Peng Cui
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning
Hao Zhu · Yifei Zhang · Junhao Dong · Piotr Koniusz
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian · Jian Zhao · Zechao Hu · Zhengwei Yang · Hao Li · Lei Jin · Zheng Wang · Xuelong Li
ESC: Erasing Space Concept for Knowledge Deletion
Tae-Young Lee · Sundong Park · Minwoo Jeon · Hyoseok Hwang · Gyeong-Moon Park
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
Yifan Wang · Jian Zhao · Zhaoxin Fan · Xin Zhang · Xuecheng Wu · Yudian Zhang · Lei Jin · Xinyue Li · Gang Wang · Mengxi Jia · Ping Hu · Zheng Zhu · Xuelong Li
MambaVO: Deep Visual Odometry by Sequential Matching Refinement and Training Smoothing
Shuo Wang · Wanting Li · Yongcai Wang · Zhaoxin Fan · Zhe Huang · xudong cai · Jian Zhao · Deying Li
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu · Yuxin Peng
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
Hyo-Jun Lee · Yeong Jun Koh · Hanul Kim · Hyunseop Kim · Yonguk Lee · Jinu Lee
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S P Sharan · Minkyu Choi · Sahil Shah · Harsh Goel · Mohammad Omama · Sandeep P. Chinchali
NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery
Reese Kneeland · Paul Scotti · Ghislain St-Yves · Jesse L Breedlove · Kendrick N Kay · Thomas Naselaris
High-quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding
Yuanqi Li · Jingcheng Huang · Hongshen Wang · Peiyuan Lv · Yansong Liu · Jiuming Zheng · Jie Guo · Yanwen Guo
Visioner: Exploring Knowledge Learning from Raw Videos
Zhongwei Ren · Yunchao Wei · Xun Guo · Yao Zhao · Bingyi Kang · Jiashi Feng · Xiaojie Jin
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging
Max Kahl · Sebastian Stricker · Lisa Hutschenreiter · Florian Bernard · Carsten Rother · Bogdan Savchynskyy
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu · Zichen Liang · Fei Yang · Qibin Hou · Xialei Liu · Ming-Ming Cheng
Category-Agnostic Neural Object Rigging
Guangzhao He · Chen Geng · Shangzhe Wu · Jiajun Wu
Open-World Amodal Appearance Completion
Jiayang Ao · Yanbei Jiang · Qiuhong Ke · Krista A. Ehinger
GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking
Hyunseop Kim · Hyo-Jun Lee · Yonguk Lee · Jinu Lee · Hanul Kim · Yeong Jun Koh
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang · Hyunje Park · Feng Yang · Heeju Ko · Euijin Choo · Sangpil Kim
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu · Filip Ilievski · Cees G. M. Snoek
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv · Yangqi Long · Congzhentao Huang · Cao Li · Chengfei Lv · Hao Ren · Dian Zheng
Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays
Laurie Bose · Piotr Dudek · Jianing Chen
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Alan Lukezic · Jovana Videnović · Matej Kristan
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding
Thomas Dagès · Simon Weber · Ya-Wei Eileen Lin · Ronen Talmon · Daniel Cremers · Michael Lindenbaum · Alfred M. Bruckstein · Ron Kimmel
BWFormer: Building Wireframe Reconstruction from airborne LiDAR point clouds with Transformer
yuzhou liu · Lingjie Zhu · Hanqiao Ye · Shangfeng Huang · Xiang Gao · Xianwei Zheng · Shuhan Shen
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
Andrei Dumitriu · Florin Tatui · Florin Miron · Aakash Ralhan · Radu Tudor Ionescu · Radu Timofte
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
Yuhao Zhou · Yuxin Tian · Jindi Lv · Mingjia Shi · Yuanxi Li · Qing Ye · Shuhao Zhang · Jiancheng Lv
Style Quantization for Data-Efficient GAN Training
Jian Wang · Xin Lan · Ji-Zhe Zhou · Yuxin Tian · Jiancheng Lv
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan · Xiaoheng Jiang · Yang Lu · Jiale Cao · Dong Chen · Mingliang Xu
Hand-held Object Reconstruction from RGB Video with Dynamic Interaction
Shijian Jiang · Qi Ye · Rengan Xie · Yuchi Huo · Jiming Chen
Simpler Diffusion: 1.5 FID on ImageNet512 with pixel-space diffusion
Emiel Hoogeboom · Thomas Mensink · Jonathan Heek · Kay Lamerigts · Ruiqi Gao · Tim Salimans
H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection
Yuhang Liu · Wenjie Zhao · Yunhui Guo
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang · Hao Lu · Qingyong Hu · Yin Wang · Kaishen Yuan · Xin Liu · Kaishun Wu
Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
Anubhav Jain · Yuya Kobayashi · Takashi Shibuya · Yuhta Takida · Nasir Memon · Julian Togelius · Yuki Mitsufuji
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Hao Li · Ju Dai · Xin Zhao · Feng Zhou · Junjun Pan · Lei Li
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang · Junru Li · Kai Zhang · Li zhang
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz · Peter Neidlinger · Marta Ligero · Georg Wölflein · Marko van Treeck · Jakob Nikolas Kather
Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model
Yuxiang Mao · Zhenfeng Fan · Zhijie Zhang · Zhiheng Zhang · Shihong Xia
See Further When Clear: Curriculum Consistency Model
Yunpeng Liu · Boxiao Liu · Yi Zhang · Xingzhong Hou · Guanglu Song · Yu Liu · Haihang You
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita · Stanislav Frolov · Brian Bernhard Moser · Takahiro Shirakawa · Ko Watanabe · Andreas Dengel · Jinjia Zhou
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin · Linwei Tao · Minjing Dong · Chang Xu
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane · Dulanga Weerakoon · Vigneshwaran Subbaraju · Sougata Sen · Sanjay Sarma · Archan Misra
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang · Reuben Tan · Qianhui Wu · Ruijie Zheng · Baolin Peng · Yongyuan Liang · Yu Gu · Mu Cai · Seonghyeon Ye · Joel Jang · Yuquan Deng · Jianfeng Gao
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution
Fei Ye · Adrian Bors
Focal Split: Untethered Snapshot Depth from Differential Defocus
Junjie Luo · John Mamish · Alan Fu · Thomas Concannon · Josiah Hester · Emma Alexander · Qi Guo
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang · Bowen Jin · Jiacheng Shen · Sirui Ding · Qiaoyu Tan · Jiawei Han
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński · Antoni Kowalczuk · Franziska Boenisch · Adam Dziedzic
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Stefan Smeu · Dragos-Alexandru Boldisor · Dan Oneata · Elisabeta Oneata
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu · Siyuan Li · Ajad Chhatkuli · Prune Truong · Luc Van Gool · Federico Tombari
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Cheng Lei · Ao Li · Hu Yao · Ce Zhu · Le Zhang
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
Xiangtao Zhang · Sheng Li · Ao Li · Yipeng Liu · Fan Zhang · Ce Zhu · Le Zhang
Detecting Open World Objects via Partial Attribute Assignment
Muli Yang · Gabriel James Goenawan · Huaiyuan Qin · Kai Han · Xi Peng · Yanhua Yang · Hongyuan Zhu
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling · Yachen Chang · Hailiang Zhao · Xinkui Zhao · Kingsum Chow · Shuiguang Deng
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner · Christoph Lippert · Aravindh Mahendran
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei · Mingming He · Li Ma · Julien Philip · Wenqi Xian · David M George · Xueming Yu · Gabriel Dedic · Ahmet Levent Taşel · Ning Yu · Vishal M. Patel · Paul Debevec
EigenGS Representation: From Eigenspace to Gaussian Image Space
LO-WEI TAI · Ching-En Ching En, Li · Cheng-Lin Chen · Chih-Jung Tsai · Hwann-Tzong Chen · Tyng-Luh Liu
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
Bikang Pan · Qun Li · Xiaoying Tang · Wei Huang · Zhen Fang · Feng Liu · Jingya Wang · Jingyi Yu · Ye Shi
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei · Feng Li · Shengeng Tang · Yao Zhao · Huihui Bai
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson · Allen Tu · Vasu Singla · Bethmage Mayuka Jayawardhana · Matthias Zwicker · Tom Goldstein
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi · Min-Cheol Sagong · SeokYeong Lee · Seung-Won Jung · Ig-Jae Kim · Junghyun Cho
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
Nicole Meng · Caleb Manicke · Ronak Sahu · Caiwen Ding · Yingjie Lao
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye · Yukang Gan · Yixiao Ge · Xiao-Ping Zhang · Yansong Tang
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen · Kien Nguyen Thanh · Akila Pemasiri · Feng Liu · Sridha Sridharan · Clinton Fookes
FluxSpace: Disentangled Image Editing in Rectified Flow Models
Yusuf Dalva · Kavana Venkatesh · Pinar Yanardag
HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
Yifang Xu · BenXiang Zhai · Yunzhuo Sun · Ming Li · Yang Li · Sidan Du
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie · Minghong Fang · Neil Zhenqiang Gong
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen · Guangcong Wang · Jiahao Zhu · Jianhuang Lai · Xiaohua Xie
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng · Jiacheng Wei · Tianrun Chen · Chi Zhang · Xiaofeng Yang · Shangzhan Zhang · Bingchen Yang · Chuan-Sheng Foo · Guosheng Lin · Qixing Huang · Fayao Liu
Estimating Body and Hand Motion in an Ego sensed World
Brent Yi · Vickie Ye · Maya Zheng · Yunqi Li · Vickie Ye · Georgios Pavlakos · Yi Ma · Jitendra Malik · Angjoo Kanazawa
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao · Daoyuan Chen · Yilun Huang · Bolin Ding · Yaliang Li · Ying Shen
Masking meets Supervision: A Strong Learning Alliance
Byeongho Heo · Taekyung Kim · Sangdoo Yun · Dongyoon Han
Robustness Analysis: Are Optical Flow Methods Safe to Use?
Libo Long · Xiao Hu · Jochen Lang
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
R.D. Lin · Pengcheng Weng · Yinqiao Wang · Han Ding · Jinsong Han · Fei Wang
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Khanh Nguyen · Ghulam Mubashar Hassan · Ajmal Mian
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag · Xianghao Kong · Jingtao Li · Michael Spranger · Lingjuan Lyu
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang · Sicheng Xu · Cassie Lee Dai · Jianfeng XIANG · Yu Deng · Xin Tong · Jiaolong Yang
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng XIANG · Zelong Lv · Sicheng Xu · Yu Deng · Ruicheng Wang · Bowen Zhang · Dong Chen · Xin Tong · Jiaolong Yang
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition
Woo Kyoung Han · Byeonghun Lee · Hyunmin Cho · Sunghoon Im · Kyong Hwan Jin
A Flag Decomposition for Hierarchical Datasets
Nathan Mankovich · Ignacio Santamaria · Gustau Camps-Valls · Tolga Birdal
VideoChat-Online: Towards Online Spatial-Temporal Video Understanding via Large Video Language Models
Zhenpeng Huang · Xinhao Li · Jiaqi Li · Jing Wang · Xiangyu Zeng · Cheng Liang · Tao Wu · Xi Chen · Liang Li · Limin Wang
4DTAM: Non-Rigid Tracking and Mapping via Surface Gaussian Splatting
Hidenobu Matsuki · Gwangbin Bae · Andrew J. Davison
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin · Fredrik Kahl · Lars Hammarstrand
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions
Qiang Li · Jian Ruan · Fanghao Wu · Yuchi Chen · Zhihua Wei · Wen Shen
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint · Diego Thomas · Jean-Sébastien Franco
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction
Yuanbo Wang · Zhaoxuan Zhang · Jiajin Qiu · Dilong Sun · Zhengyu Meng · Xiaopeng Wei · Xin Yang
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee · Gregory A. Lodygensky · Christian Desrosiers · Jose Dolz
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang · Shuting He · Xiang Fang · Jiawei Han · Zhonghang Liu · Xin Ning · Weijun Li · Prayag Tiwari
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani · Ali Bahri · Moslem Yazdanpanah · Mehrdad Noori · David OSOWIECHI · Gustavo Vargas Hakim · Farzad Beizaee · Milad Cheraghalikhani · Arnab Mondal · Herve Lombaert · Christian Desrosiers
PolarNeXt: Rethink Instance Segmentation with Polar Representation
Jiacheng Sun · Xinghong Zhou · Yiqiang Wu · Bin Zhu · Jiaxuan Lu · Yu Qin · Xiaomao Li
CryptoFace: End-to-End Encrypted Face Recognition
Wei Ao · Vishnu Naresh Boddeti
Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction
Ning Ni · Libao Zhang
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed · Vishnu Naresh Boddeti
Enhancing Diversity for Data-free Quantization
Kai Zhao · zhihao zhuang · Miao Zhang · Chenjuan Guo · Yang Shu · Bin Yang
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Zeqing Wang · Qingyang Ma · Wentao Wan · Haojie Li · Keze Wang · Yonghong Tian
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
Zhuoran ZHAO · Linlin Yang · Pengzhan Sun · Pan Hui · Angela Yao
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin · Zhu Xu · Yang Liu
Learning Textual Prompts for Open-World Semi-Supervised Learning
Yuxin Fan · Junbiao Cui · Jiye Liang
Can Large Vision-Language Models Correct Grounding Errors By Themselves?
Yuan-Hong Liao · Rafid Mahmood · Sanja Fidler · David Acuna
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
Han Hu · Wenli Du · Peng Liao · Bing Wang · Siyuan Fan
Gromov–Wasserstein Problem with Cyclic Symmetry
Shoichiro Takeda · Yasunori Akagi
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner · Ruihang Zhang · Mathieu Tuli · David B. Lindell
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang · Naishan Zheng · Jie Huang · Feng Zhao
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform
Toan Nguyen · Kien Do · Duc Kieu · Thin Nguyen
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee · Erika Lu · Sarah Rumbley · Michal Geyer · Jia-Bin Huang · Tali Dekel · Forrester Cole
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng · Henry Zheng · Gao Huang
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han · Liang Ma · Kamila Zhumakhanova · Ekaterina Radionova · Jingyi Zhang · Xiaojun Chang · Xiaodan Liang · Ivan Laptev
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts
Zhaoxing Gan · Mengtian Li · Ruhua Chen · Zhongxia JI · Sichen Guo · Huanling Hu · Guangnan Ye · Zuo Hu
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu · Zhuoqi Ma · Xiaolu Kang · Yunan Li · Kun XIE · Zhicheng Jiao · Qiguang Miao
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li · Hongli XU · Junwen Huang · HyunJun Jung · Kuan-Ting Yu · Nassir Navab · Benjamin Busam
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Seungtae Nam · Xiangyu Sun · Gyeongjin Kang · Younggeun Lee · Seungjun Oh · Eunbyung Park
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu · Peixi Peng · Guang Tan · Yiqian Chang · Luntong Li · Yonghong Tian
Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions
Quanyuan Ruan · Jiabao Lei · Wenhao Yuan · Yanglin Zhang · Dekun Lu · Guiliang Liu · Kui Jia
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen · Chengyu Wang · Dakan Wang · Taolin Zhang · Wangyue Li · Xiaofeng He
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai · Janek Ebbers · Yu-Chiang Frank Wang · François Germain · Michael J. Jones · Moitreya Chatterjee
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li · Wenjie Nie · Yan Zhang · Runze Hu · Ke Li · Xiawu Zheng · Liujuan Cao
SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer
Chunnan Shang · Zhizhong Wang · Hongwei Wang · Xiangming Meng
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
Zezeng Li · Xiaoyu Du · Na Lei · Liming Chen · Weimin Wang
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu · Yanggeng Li · Fei Hou · Junhui Hou · Zhebin Zhang · Shengfa Wang · Na Lei · Ying He
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
Qinghe Ma · Jian Zhang · Zekun Li · Lei Qi · Qian Yu · Yinghuan Shi
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang · Zekun Li · Jian Zhang · Lei Qi · Yinghuan Shi
Mixture of Submodule for Domain Adaptive Person Search
Minsu Kim · Seungryong Kim · Kwanghoon Sohn
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang · June Suk Choi · Jaehyeong Jo · Kimin Lee · Sung Ju Hwang
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu · Zikai Zhang · Rui Hu
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu · Bingke Zhu · Guibo Zhu · Yingying Chen · Ming Tang · Jinqiao Wang
Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li · Xin Jin · Bo-Yuan Sun · Chun-Le Guo · Ming-Ming Cheng
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu · Yixin Chen · Junfeng Ni · Baoxiong Jia · Yu Liu · Diwen Wan · Gang Zeng · Siyuan Huang
UniSTD: Towards Unified Spatio-Temporal Prediction across Diverse Disciplines
Chen Tang · Xinzhu Ma · Encheng Su · Xiufeng Song · Xiaohong Liu · Wei-Hong Li · Lei Bai · Wanli Ouyang · Xiangyu Yue
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu · Xiaohang Zhan · Zizheng Yan · Zhongjin Luo · Yuxin Wen · Xiaoguang Han
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang · Yuzhe Weng · Yueyan Li · Zilu Guo · Jun Du · Shutong Niu · Jiefeng Ma · Shan He · Wu Xiaoyan · Qiming Hu · Bing Yin · Cong Liu · Qingfeng Liu
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis
Yinghao Wu · Shihui Guo · Yipeng Qin
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Genaration
Jinnan Chen · Tao Hu · Hao Zhang · Lingting Zhu · Zeyu HU · Shengju Qian · Yugang Chen · Xin Wang · Gim Hee Lee
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma · Huachen Gao · Haoge Deng · Zhengxiong Luo · Tiejun Huang · Lulu Tang · Xinlong Wang
LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate
Haoyan Gong · Zhenrong Zhang · Yuzheng Feng · Anh Nguyen · Hongbin Liu
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao · Hao Luo · Zhiwei Zhong · Peilin CHEN · Zhijiang Zhang · Shiqi Wang
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
Zhifeng Wang · Renjiao Yi · Xin Wen · Chenyang Zhu · Kai Xu
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann · Felix Krause · Michael Neumayr · Nick Stracke · Melvin Sevi · Vincent Tao Hu · Björn Ommer
GOAL: Global-local Object Alignment Learning
Hyungyu Choi · Young Kyun Jang · Chanho Eom
DTOS: Dynamic Time Object Sensing with Multimodal Large Language Model
Jirui Tian · Jinrong Zhang · Shenglan Liu · Luhao Xu · Zhixiong Huang · Gao Huang
Unlocking Video-LLM via Agent-of-Thoughts Distillation
Yudi Shi · Shangzhe Di · Qirui Chen · Weidi Xie
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
Haitao Wu · Qing Li · Changqing Zhang · Zhen He · Xiaomin Ying
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang · Longguang Wang · Zhiyuan Ma · Qibin Hu · Kai Xu · Yulan Guo
Where’s the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai · Yiyou Sun · Wei Cheng · Haifeng Chen
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess · Jeffrey J Nirschl · Laura Bravo-Sánchez · Alejandro Lozano · Sanket Rajan Gupte · Jesus G. Galaz-Montoya · Yuhui Zhang · Yuchang Su · Disha Bhowmik · Zachary Coman · Sarina M. Hasan · Alexandra Johannesson · William D. Leineweber · Malvika G Nair · Ridhi Yarlagadda · Connor Zuraski · Wah Chiu · Sarah Cohen · Jan N. Hansen · Manuel D Leonetti · Chad Liu · Emma Lundberg · Serena Yeung
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan · John Wheatley Lambert · Hong Jeon · Sakshum Kulshrestha · Yijing Bai · Jing Luo · Dragomir Anguelov · Mingxing Tan · Chiyu “Max” Jiang
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images
Jungho Lee · Suhwan Cho · Taeoh Kim · Ho-Deok Jang · Minhyeok Lee · Geonho Cha · Dongyoon Wee · Dogyoon Lee · Sangyoun Lee
DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes
Ashish Kumar · A. N. Rajagopalan
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
Jing Zhu · Yuhang Zhou · Shengyi Qian · Zhongmou He · Tong Zhao · Neil Shah · Danai Koutra
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han · Yuan Tang · Jinfeng Xu · Xianzhi Li
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds
Jinfeng Xu · Xianzhi Li · Yuan Tang · Xu Han · Qiao Yu · yixue Hao · Long Hu · Min Chen
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
Qiao Yu · Xianzhi Li · Yuan Tang · Xu Han · Long Hu · yixue Hao · Min Chen
Learning to Anticipate Table Tennis Hits from Monocular Video
Daniel Etaat · Dvij Rajesh Kalaria · Nima Rahmanian · Shankar Sastry
Realistic Test-Time Adaptation of Vision-Language Models
Maxime Zanella · Clément Fuchs · Christophe De Vleeschouwer · Ismail Ben Ayed
Open Set Label Shift with Test Time Out-of-Distribution Reference
Changkun Ye · Russell Tsuchida · Lars Petersson · Nick Barnes
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun · Shanhui Zhao · Tao Yu · Hao Wen · Samith Va · Mengwei Xu · Yuanchun Li · Chongyang Zhang
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng · Tongjia Chen · Shoubin Yu · Taojiannan Yang · Lincoln Spencer · Yapeng Tian · Ajmal Mian · Mohit Bansal · Chen Chen
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Beilin Chu · Xuan Xu · Xin Wang · Yufei Zhang · Weike You · Linna Zhou
Hyperbolic Category Discovery
Yuanpei Liu · Zhenqi He · Kai Han
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
Shangzhan Zhang · Jianyuan Wang · Yinghao Xu · Nan Xue · Christian Rupprecht · Xiaowei Zhou · Yujun Shen · Gordon Wetzstein
Spk2SRImgNet: Super-Resolve Dynamic Scene from Spike Stream via Motion Aligned Collaborative Filtering
Yuanlin Wang · Yiyang Zhang · Ruiqin Xiong · Jing Zhao · Jian Zhang · Xiaopeng Fan · Tiejun Huang
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong · Xiyao Wang · Dong Guo · Qinghao Ye · Haoqi Fan · Quanquan Gu · Heng Huang · Chunyuan Li
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng · Menglei Chai · Delio Vicini · Yuxiao Zhou · Yinghao Xu · Leonidas Guibas · Gordon Wetzstein · Thabo Beeler
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu · Songlin Du
LC-Mamba: Local and Continuous Mamba with Shifted Windows for Frame Interpolation
Min Wu Jeong · Chae Eun Rhee
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
Yuhui Liu · Liangxun Ou · Qiang Fu · Hadi Amata · Wolfgang Heidrich · YIFAN PENG
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina · Adriana Valentina Costache · Adrian-Gabriel Chifu · Josiane Mothe · Radu Tudor Ionescu
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Khalifa Afane · Gabrielle Ebbrecht · Ying Wang · Juntao Chen · Junaid Farooq
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
German Barquero · Nadine Bertsch · Manojkumar Marramreddy · Carlos Chacón · Filippo Arcadu · Ferran Rigual · Nicky Sijia He · Cristina Palmero · Sergio Escalera · Yuting Ye · Robin Kips
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Chen Chen · Daochang Liu · Mubarak Shah · Chang Xu
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
Takeshi Noda · Chao Chen · Junsheng Zhou · Weiqi Zhang · Yu-Shen Liu · Zhizhong Han
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou · Ke Mei · Yu Lu · Tianyi Wang · Fengyun Rao
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru · Vlad Hondru · Radu Tudor Ionescu · Nicu Sebe · Mubarak Shah
Activating Sparse Part Concepts for 3D Class Incremental Learning
Zhenya Tian · Jun Xiao · Liu lupeng · Haiyong Jiang
Hybrid Concept Bottleneck Models
Yang Liu · Tianwei Zhang · Shi Gu
Human-Aligned Video Generation Benchmark
Hui Han · Siyuan Li · Jiaqi Chen · Yiwen Yuan · Yuling Wu · Yufan Deng · Chak Tou Leong · Hanwen Du · Junchen Fu · Youhua Li · Jie Zhang · Chi Zhang · Li-jia Li · Yongxin Ni
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar · Jiabin Liang · Gim Hee Lee
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
Jiamin WU · Kenkun Liu · Han Gao · Xiaoke Jiang · Yuan Yao · Lei Zhang
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot · Ievgen Redko · Anton Mallasto · Charlotte Laclau · Oliver Struckmeier · Karol Arndt · Markus Heinonen · Ville Kyrki · Samuel Kaski
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Zhiyang Guo · Jinxu Xiang · Kai Ma · Wengang Zhou · Houqiang Li · Ran Zhang
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song · weixing chen · Yang Liu · Weikai Chen · Guanbin Li · Liang Lin
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
Suyoung Lee · JAEYOUNG CHUNG · Kihoon Kim · Jaeyoo Huh · Gunhee Lee · Minsoo Lee · Kyoung Mu Lee
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li · Liansheng Zhuang · Xiao Long · MingHong Yao · Shafei Wang
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li · Zixuan Huang · Anh Thai · James Rehg
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
Cédric Vincent · Taehyoung Kim · Henri Meeß
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Xin Wang · Kai Chen · Jiaming Zhang · Jingjing Chen · Xingjun Ma
MINIMA: Modality Invariant Image Matching
Jiangwei Ren · Xingyu Jiang · Zizhuo Li · Dingkang Liang · Xin Zhou · Xiang Bai
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh · Simone Schaub-Meyer · Stefan Roth
ISMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang · Qihan Zhao · Runyi Yu · Hok Wai Tsui · Ailing Zeng · Jing Lin · Zhengyi Luo · Jiwen Yu · Xiu Li · Qifeng Chen · Jian Zhang · Lei Zhang · Ping Tan
Rashomon Sets for Prototypical-Part Models: Editing Accurate Interpretable Models in Real-Time
Jon Donnelly · Zhicheng Guo · Alina Jade Barnett · Hayden McTavish · Chaofan Chen · Cynthia Rudin
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
Ziang Li · Hongguang Zhang · Juan Wang · Meihui Chen · Hongxin Hu · Wenzhe Yi · Xiaoyang Xu · Mengda Yang · Chenjun Ma
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
Srinivasa Rao Nandam · Sara Atito · Zhenhua Feng · Josef Kittler · Muhammad Awais
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang · Feng Lin · Tong Wu · Zhenguang Liu · Zhongjie Ba · Kui Ren
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
Chen Zhao · Zhizhou Chen · Yunzhe Xu · Enxuan Gu · Jian Li · Zili Yi · qian Wang · Jian Yang · Ying Tai
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility
Yidi Li · Jun Xiao · Zhengda Lu · Yiqun Wang · Haiyong Jiang
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim · Kangwook Ko · YuJin Yang · Changick Kim
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Cheng-De Fan · Chen-Wei Chang · Yi-Ruei Liu · Jie-Ying Lee · Jiun-Long Huang · Yu-Chee Tseng · Yu-Lun Liu
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang · Cheng-De Fan · Chia-Che Chang · Yi-Chen Lo · Yu-Chee Tseng · Jiun-Long Huang · Yu-Lun Liu
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
Chris Dongjoo Kim · Jihwan Moon · Sangwoo Moon · Heeseung Yun · Sihaeng Lee · Aniruddha Kembhavi · Soonyoung Lee · Gunhee Kim · Sangho Lee · Christopher Clark
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation
Zheng Zhou · Wenquan Feng · Shuchang Lyu · Guangliang Cheng · Xiaowei Huang · Qi Zhao
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang · Haodong Lu · Lina Yao · Dong Gong
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
Vaibhav Rathore · Shubhranil B · Saikat Dutta · Sarthak Mehrotra · Zsolt Kira · Biplab Banerjee
Uni4D: Unifying Large Vision Models for 4D Modeling from a Single Video
David Yifan Yao · Albert J. Zhai · Shenlong Wang
Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection
Qi Chen · Hu Ding
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Jingyu Lin · Jiaqi Gu · Lubin Fan · Bojian Wu · Yujing Lou · Renjie Chen · Ligang Liu · Jieping Ye
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong · Chaelin Kim · Serin Yoon · Junghyun Nam · Sihun Cha · Junyong Noh
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh · Sungjune Kim · Heeju Ko · Hyunggun Chi · Jinkyu Kim · Dongwook Lee · Daehyun Ji · Sungjoon Choi · Sujin Jang · Sangpil Kim
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen · Guoqiang Gong · Tianxiang Hao · Tao He · Yifeng Zhang · Pengzhang Liu · Sicheng Zhao · Jungong Han · Guiguang Ding
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation
Junsung Park · HwiJeong Lee · Inha Kang · Hyunjung Shim
Multi-party Collaborative Attention Control for Image Customization
Han Yang · Chuanguang Yang · Qiuli Wang · Zhulin An · Weilun Feng · Libo Huang · Yongjun Xu
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man · Yichen Sheng · Jianming Zhang · Liangyan Gui · Yu-Xiong Wang
CroCoDL: Cross-device Collaborative Dataset for Localization
Hermann Blum · Alessandro Mercurio · Joshua O’Reilly · Tim Engelbracht · Mihai Dusmanu · Marc Pollefeys · Zuria Bauer
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo · Yang Liu · weixing chen · Zhen Li · Yaowei Wang · Guanbin Li · Liang Lin
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
Cong Wei · Haoxian Tan · Yujie Zhong · Yong Liu · Jie Hu · Dengjie Li · Zheng Zhao · Yujiu Yang
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
Tao Sun · Yuhao Huang · Li Shen · Kele Xu · Bao Wang
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
Wei Deng · Mengshi Qi · Huadong Ma
GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection
Dušan Malić · Christian Fruhwirth-Reisinger · Samuel Schulter · Horst Possegger
LiSu: A Dataset and Method for LiDAR Surface Normal Estimation
Dušan Malić · Christian Fruhwirth-Reisinger · Samuel Schulter · Horst Possegger
Generalizable Object Keypoint Localization from Generative Priors
Dongkai Wang · Jiang Duan · Liangjian Wen · Shiyu Xuan · Hao CHEN · Shiliang Zhang
RelationField: Relate Anything in Radiance Fields
Sebastian Koch · Johanna Wald · Mirco Colosi · Narunas Vaskevicius · Pedro Hermosilla · Federico Tombari · Timo Ropinski
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping
Pascal Chang · Sergio Sancho · Jingwei Tang · Markus Gross · Vinicius C. Azevedo
SinGS: Animatable Single-Image Human Gaussian Splats with Kinematic Priors
Yufan Wu · Xuanhong Chen · Wen Li · Shunran Jia · Hualiang Wei · Kairui Feng · Jialiang CHEN · Yuhan Li · Ang He · Weimin Zhang · Bingbing Ni · Wenjun Zhang
SimLTD: Simple Semi-Supervised Long-Tailed Object Detection
Phi Vu Tran
DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
Xingjian Li · Qiming Zhao · Neelesh Bisht · Mostofa Uddin Uddin · Jin Yu Kim · Bryan Zhang · Min Xu
GenManip: A Simulation Platform for Generalizable TableTop Manipulation in the Era of MLLM
Ning Gao · Yilun Chen · Shuai Yang · Xinyi Chen · Yang Tian · Hao Li · Haifeng Huang · Hanqing Wang · Tai Wang · Jiangmiao Pang
Diffusion Model is Effectively its Own Teacher
Xinyin Ma · Runpeng Yu · Songhua Liu · Gongfan Fang · Xinchao Wang
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu · Hanqing Wang · Ye Shi · Haoyang Luo · Sibei Yang · Jingyi Yu · Jingya Wang
Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection
Ruiheng Liu · Haozhe Chen · Boyao Zhao · Kejiang Chen · Weiming Zhang
TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification
Dongyoon Yang · Jihu Lee · Yongdai Kim
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration
Jae Hyeon Park · Joo Hyeon Jeon · Jae Yun Lee · Sangyeon Ahn · MinHee Cha · Min Geol Kim · Hyeok Nam · Sung In Cho
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang · Shuangfei Zhai · Miguel Ángel Bautista · Kevin Miao · Alexander Toshev · Joshua Susskind · Jiatao Gu
Accelerating Multimodel Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao · Zhenting Wang · Felix Juefei-Xu · Xide Xia · Miao Liu · Xiaofang Wang · Mingfu Liang · Ning Zhang · Dimitris N. Metaxas · Licheng Yu
LongDiff: Training-Free Long Video Generation in One Go
Zhuoling Li · Hossein Rahmani · Qiuhong Ke · Jun Liu
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
Mariamma Antony · Rajiv Porana · Sahil M. Lathiya · Siva Teja Kakileti · Chiranjib Bhattacharyya
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
Saad Lahlali · Sandra Kara · Hejer AMMAR · Florian Chabot · Nicolas Granger · Hervé Le Borgne · Quoc Cuong PHAM
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
Zhiqiang Yan · Zhengxue Wang · Kun Wang · Jun Li · Jian Yang
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier · Mehmet Aygun · Changjian Li · Hakan Bilen · Oisin Mac Aodha
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang · Chih-Yao Ma · Yen-Cheng Liu · Ji Hou · Tao Xu · Jialiang Wang · Felix Juefei-Xu · Yaqiao Luo · Peizhao Zhang · Tingbo Hou · Peter Vajda · Niraj Jha · Xiaoliang Dai
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI · Chenwei Wang · Zhaoke Huang · Centre for Intelligent Multidimensional Data Analysis · Hong Kong Baptist University · Hong Kong Baptist University
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang · Haoyu Ma · Zecheng He · Tingbo Hou · Ji Hou · Kunpeng Li · Xiaoliang Dai · Felix Juefei-Xu · Samaneh Azadi · Animesh Sinha · Peizhao Zhang · Peter Vajda · Diana Marculescu
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning
Kaixuan Wu · Xinde Li · Xinglin Li · Chuanfei Hu · Guoliang Wu
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
Jiayi Su · Youhe Feng · Zheng Li · Jinhua Song · Yangfan He · Botao Ren · Botian Xu
Type-R: Automatically Retouching Typos for Text-to-Image Generation
Wataru Shimoda · Naoto Inoue · Daichi Haraguchi · Hayato Mitani · Seiichi Uchida · Kota Yamaguchi
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin · Yunsheng Li · Dongdong Chen · Weijian Xu · Ronald Clark · Philip H.S. Torr
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou · Hui Ren · Yijia Weng · Shuwang Zhang · Zhen Wang · Dejia Xu · Zhiwen Fan · Suya You · Zhangyang Wang · Leonidas Guibas · Achuta Kadambi
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao · Di Liu · Jack Li · Yunsheng Li · Dongdong Chen · Mu Zhou · Dimitris N. Metaxas
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
Muhammad Hamza Mughal · Rishabh Dabral · Merel CJ Scholman · Vera Demberg · Christian Theobalt
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma · Yuxin Feng · Yan Zhang · Jinyuan Liu · Weimin Wang · Guang-Yong Chen · Chengpei Xu · Zhuo Su
UMFN: Unified Multi-Domain Face Normalization for Joint Cross-domain Prototype Learning and Heterogeneous Face Recognition
Meng Pang · WenjunZhang · Nanrun Zhou · Shengbo Chen · Hong Rao
DifIISR: Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
Xingyuan Li · Zirui Wang · Yang Zou · Zhixin Chen · Jun Ma · Zhiying Jiang · Long Ma · Jinyuan Liu
SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs
Guibiao Liao · Qing Li · Zhenyu Bao · Guoping Qiu · KANGLIN LIU
Segment Any Motion in Videos
Nan Huang · Wenzhao Zheng · Chenfeng Xu · Kurt Keutzer · Shanghang Zhang · Angjoo Kanazawa · Qianqian Wang
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
jiajun cao · Yuan Zhang · Tao Huang · Ming Lu · Qizhe Zhang · Ruichuan An · Ningning Ma · Shanghang Zhang
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee · Debasmit Das · Munawar Hayat · Sungha Choi · Kyuwoong Hwang · Fatih Porikli
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang · Guanlin Wu · Ling-Hao Chen · Zhuokai Zhao · Jing Lin · Xiaoke Jiang · Jiamin WU · Zhuoheng Li · Hao Frank Yang · Haoqian Wang · Lei Zhang
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun · Ana Serrano · Belen Masia · Matheus Gadelha · Yannick Hold-Geoffroy · Xin Sun · Diego Gutierrez
Order-One Rolling Shutter Cameras
Marvin Anas Hahn · Kathlén Kohn · Orlando Marigliano · Tomas Pajdla
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim · Hea In Jeong · JungHoon Sung · Yihua Cheng · Jeongmin Lee · Ju Yong Chang · Sang-Il Choi · YOUNGGEUN CHOI · Saim Shin · Jungho Kim · Hyung Jin Chang
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
zehan wang · Sashuai zhou · Shaoxuan He · Haifeng Huang · Lihe Yang · Ziang Zhang · Xize Cheng · Shengpeng Ji · Tao Jin · Hengshuang Zhao · Zhou Zhao
Black Hole-Driven Identity Absorbing in Diffusion Models
Muhammad Shaheryar · Jong Taek Lee · Soon Ki Jung
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai · Fei Yin · Dounia Hammou · Rafal Mantiuk
RoboGround: Robot Manipulation with Grounded Vision-Language Priors
Haifeng Huang · Xinyi Chen · Yilun Chen · Hao Li · Xiaoshen Han · zehan wang · Tai Wang · Jiangmiao Pang · Zhou Zhao
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM · GeonU Kim · Kim Yu-Ji · Yu-Chiang Frank Wang · Jaesung Choe · Tae-Hyun Oh
SeeGround: See and Ground for Zero-shot Open-Vocabulary 3D Visual Grounding
Rong Li · Shijie Li · Lingdong Kong · Xulei Yang · Junwei Liang
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Junha Hyung · Kinam Kim · Susung Hong · Min-Jung Kim · Jaegul Choo
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen · Boyang Sun · Anran Zhang · Marc Pollefeys · Stefan Leutenegger
Exploiting Temporal State Space Sharing for Video Semantic Segmentation
Hesham Syed · Yun Liu · Guolei Sun · Henghui Ding · Jing Yang · Ender Konukoglu · Xue Geng · Xudong Jiang
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao · Chaojun Ni · Xiaofeng Wang · Zheng Zhu · Xueyang Zhang · Yida Wang · Guan Huang · xinze chen · Boyuan Wang · Youyi Zhang · Wenjun Mei · Xingang Wang
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang · Xiaofeng Wang · Chaojun Ni · Guosheng Zhao · Zhiqin Yang · Zheng Zhu · Muyang Zhang · YuKun Zhou · xinze chen · Guan Huang · lihong liu · Xingang Wang
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma · jidong kuang · Hongsong Wang · Jie Gui
Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model
Yue-Hua Han · Tai-Ming Huang · Kailung Hua · Jun-Cheng Chen
Foundations of the Theory of Performance-Based Ranking
Sébastien Piérard · Anaïs Halin · Anthony Cioppa · Adrien Deliege · Marc Van Droogenbroeck
Towards Smart Point-and-Shoot Photography
Jiawan Li · Fei Zhou · Zhipeng Zhong · Jiongzhi Lin · Guoping Qiu
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
Xiaoling Zhou · Zhemg Lee · Wei Ye · Rui Xie · Wenbo Zhang · Guanju Peng · Zongze Li · Shikun Zhang
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies · Niccolò Cavagnero · Alexander Hermans · Narges Norouzi · Giuseppe Averta · Bastian Leibe · Gijs Dubbelman · Daan de Geus
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Wei Li · Pin-Yu Chen · Sijia Liu · Ren Wang
UNIALIGN: Scaling Multimodal Alignment within One Unified Model
bo zhou · Liulei Li · Yujia Wang · 刘华峰 Liu · Yazhou Yao · Wenguan Wang
Towards Generalizable Scene Change Detection
JAEWOO KIM · Ue-Hwan Kim
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
Jan Held · Renaud Vandeghen · Abdullah J Hamdi · Anthony Cioppa · Adrien Deliege · Silvio Giancola · Andrea Vedaldi · Bernard Ghanem · Marc Van Droogenbroeck
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths · Maryam Haghighat · Simon Denman · Clinton Fookes · Milad Ramezani
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang · Wenke Huang · Guancheng Wan · Qu Yang · Mang Ye
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim · Sungho Suh · Minsik Lee
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding
Wenbo Chen · Zhen Xu · Ruotao Xu · Si Wu · Hau San Wong
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri · Mahdieh Baghshah · Mohammad Taher Pilehvar
STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search
Yuning Qiu · Andong Wang · Chao Li · Haonan Huang · Guoxu Zhou · Qibin Zhao
Structured Artifact Removal with Scale-Adaptive Deformable Transformer
Xuyi He · Yuhui Quan · Ruotao Xu · Hui Ji
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu · Shihe Wang · Jiayi Zhang · Jiaxin Chen · Yunhong Wang
Towards General Visual-Linguistic Face Forgery Detection
Ke Sun · Shen Chen · Taiping Yao · Ziyin Zhou · Jiayi Ji · Xiaoshuai Sun · Chia-Wen Lin · Rongrong Ji
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou · Xu Gao · Zichong Chen · Hui Huang
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu · Rui Hu · Jitao Sang
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Uranik Berisha · Jens Mehnert · Alexandru Paul Condurache
T-FAKE: Synthesizing Thermal Images for Facial Landmarking
Philipp Flotho · Moritz Piening · Anna Kukleva · Gabriele Steidl
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin · Yuqiang Ren · Ke Yan · Shouhong Ding · Yongtao Hao
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu · Lu Zhang · Hang Yao · Junlong Du · Ke Yan · Shouhong Ding · Yunsheng Wu · Xiaoqiang Li
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft
Gaozhi Liu · Silu Cao · Zhenxing Qian · Xinpeng Zhang · Sheng Li · Wanli Peng
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang · Zhiwu Huang · Xiaopeng Hong
Mitigating Ambiguities in 3D Classification with Gaussian Splatting
Ruiqi Zhang · Hao Zhu · Jingyi Zhao · Qi Zhang · Xun Cao · Zhan Ma
Audio-Visual Instance Segmentation
Ruohao Guo · Xianghua Ying · Yaru Chen · Dantong Niu · Guangyao Li · Liao Qu · Yanyu Qi · Jinxing Zhou · Bowei Xing · Wenzhen Yue · Ji Shi · Qixun Wang · Peiliang Zhang · Buwen Liang
De22Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation
Yunfeng Xiao · Xiaowei Bai · Baojun Chen · Hao Su · Hao He · Liang Xie · Erwei Yin
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu · Jiangwei Lao · Deyi Ji · Junwei Luo · Kang Wu · Yingying Zhang · Lixiang Ru · Jian Wang · Jingdong Chen · Ming Yang · Dong Liu · Feng Zhao
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei · Chenyu Lin · Yu Qiu · Yaonan Wang · Hui Zhang · Ziyang Wang · Dong Dai
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang · Chang Liu · Jin Wei · Xiaomeng Yang · Yu ZHOU · Can Ma · Xiangyang Ji
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi · Biao Gong · Xi Chen · DanDan Zheng · Shuai Tan · Zizheng Yang · Yuyuan Li · Jingwen He · Kecheng Zheng · Jingdong Chen · Ming Yang · Yinqiang Zheng
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang · Jiluan Fan · Anli Yan · Erdun Gao · Xin Lin · Tao Li · Kanghua Mo · Changyu Dong
NTR-Gaussian: Nighttime Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics
Kun Yang · Yuxiang Liu · Zeyu Cui · Yu Liu · Maojun Zhang · Shen Yan · Qing Wang
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
Ming Yan · Xincheng Lin · Yuhua Luo · Shuqi Fan · Yudi Dai · Qixin Zhong · Lincai Zhong · Yuexin Ma · Lan Xu · Chenglu Wen · Siqi Shen · Cheng Wang
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong · Qi Jiang · Jingyi Yu · Yuexin Ma
PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction
Mingzhi Pei · Xu Cao · Xiangyi Wang · Heng Guo · Zhanyu Ma
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
Sinisa Stekovic · Arslan Artykov · Stefan Ainetter · Mattia Durso · Friedrich Fraundorfer
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
Wenlong Fang · Qiaofeng Wu · Jing Chen · Yun Xue
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li · Jinglin Xu · Yunzhen Zhao · Yuxin Peng
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
Cong Ruan · Yuesong Wang · Bin Zhang · Lili Ju · Tao Guan
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin · Zhenbo Yu · Yang Shen · Zhenyong Fu · Jian Yang
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto · Antonino Furnari · Yoichi Sato · Giovanni Maria Farinella
EvOcc: Accurate Semantic Occupancy for Automated Driving Using Evidence Theory
Jonas Kälble · Sascha Wirges · Maxim Tatarchenko · Eddy Ilg
Escaping Plato’s Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi · Luca Moschella · Andrea Santilli · Diego Gomez · Qixing Huang · Emanuele Rodolà · Simone Melzi · Maks Ovsjanikov
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan · Yibo Peng · Jinke Ren · Yinghong Liao · Yatong Han · Chun-Mei Feng · Hengshuang Zhao · Guanbin Li · Shuguang Cui · Zhen Li
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen · Dannong Xu · Junjie Fei · Chun-Mei Feng · Mohamed Elhoseiny
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen · Jiaxiang Tang · Yuhao Dong · Ziang Cao · Fangzhou Hong · Yushi Lan · Tengfei Wang · Haozhe Xie · Tong Wu · Shunsuke Saito · Liang Pan · Dahua Lin · Ziwei Liu
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu · Shi Qiu · ZHENGZHE LIU · Ka-Hei Hui · Qianyi Wu · Pheng-Ann Heng · Chi-Wing Fu
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang · Jisang Yoo · Jihyeon Park · Seungtae Nam · Hyeonsoo Im · Shin sangheon · Sangpil Kim · Eunbyung Park
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin · Ke Wu · Jie Li · Jun Li · Wu-Jun Li
Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
Hoang Chuong Nguyen · Wei Mao · Jose M. Alvarez · Miaomiao Liu
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong · Chao Liang · Jianwen Jiang · Gaojie Lin · Jiaqi Yang · Zhou Zhao
Supervising Sound Localization using In-the-wild Egomotion
Anna Min · Ziyang Chen · Hang Zhao · Andrew Owens
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
Xinyi Zhang · Naiqi Li · Angela Dai
LT3SD: Latent Trees for 3D Scene Diffusion
Quan Meng · Lei Li · Matthias Nießner · Angela Dai
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang · Junliang Guo · Xinyi Xie · Tianyu He · Xu Sun · Jiang Bian
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Yan Wang · Baoxiong Jia · Ziyu Zhu · Siyuan Huang
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang · Baoxiong Jia · Yan Wang · Ziyu Zhu · Xiongkun Linghu · Qing Li · Song-Chun Zhu · Siyuan Huang
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
Keyizhi Xu · Chi Zhang · Zhan Chen · Zhongyuan Wang · Chunxia Xiao · Chao Liang
GSTAR: Gaussian Surface Tracking and Reconstruction
Chengwei Zheng · Lixin Xue · Juan Jose Zarate · Jie Song
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu · HaoYu Han · Zhe Tao · Hantao Yao · Changsheng Xu
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu · Ligeng Zhu · Baifeng Shi · Zhuoyang Zhang · Yuming Lou · Shang Yang · Haocheng Xi · Shiyi Cao · Yuxian Gu · Dacheng Li · Xiuyu Li · Haotian Tang · Yunhao Fang · Yukang Chen · Cheng-Yu Hsieh · De-An Huang · An-Chieh Cheng · Jinyi Hu · Sifei Liu · Ranjay Krishna · Pavlo Molchanov · Jan Kautz · Danny Yin · Song Han · Yao Lu
Multi-layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin · Haoran Chen · Yue Fan · Yingqi Fan · Xin Jin · Hui Su · Jinlan Fu · Xiaoyu Shen
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Kaiyue Sun · Kaiyi Huang · Xian Liu · Yue Wu · Zihan Xu · Zhenguo Li · Xihui Liu
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang · Yuqian Yuan · Wenqi Zhang · Yifei Xin · Boqiang Zhang · Long Li · Liuyi Wang · qinyang zeng · Xin Li · Lidong Bing
Towards Human-Understandable Multi-Dimensional Concept Discovery
Arne Grobrügge · Niklas Kühl · Gerhard Satzger · Philipp Spitzer
Condensing Action Segmentation Datasets via Generative Network Inversion
Guodong Ding · Rongyu Chen · Angela Yao
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
Tong Li · Lizhi Wang · Zhiyuan Xu · Lin Zhu · Wanxuan Lu · Hua Huang
Be More Specific: Evaluating Object-centric Realism in Synthetic Images
Anqi Liang · Ciprian Adrian Corneanu · Qianli Feng · Giorgio Giannone · Aleix Martinez
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
Hsiang-Wei Huang · Fu-Chen Chen · Wenhao Chai · Che-Chun Su · Lu Xia · Sanghun Jung · Cheng-Yen Yang · Jenq-Neng Hwang · Min Sun · Cheng-Hao Kuo
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin · Sida Peng · Jingxiao Chen · Songyou Peng · Jiaming Sun · Minghuan Liu · Hujun Bao · Jiashi Feng · Xiaowei Zhou · Bingyi Kang
On Denoising Walking Videos for Gait Recognition
Dongyang Jin · Chao Fan · Jingzhe Ma · Jingkai Zhou · Weihua Chen · Shiqi Yu
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
Yuheng Feng · Changsong Wen · Zelin Peng · Li jiaye · Siyu Zhu
Video Language Model Pretraining with Spatio-temporal Masking
Yue Wu · Zhaobo Qi · Junshu Sun · Yaowei Wang · Qingming Huang · Shuhui Wang
R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner
Ziyi Bai · Hanxuan Li · Bin Fu · Chuyan Xiong · Ruiping Wang · Xilin Chen
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu · Yichi Cai · Shangzhan Zhang · Hongyu Zhou · Haoji Hu · Huimin Yu · Andreas Geiger · Yiyi Liao
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu · Jintong Li · Yicheng Jiang · Niranjan Sujay · Zhicheng Yang · Juexiao Zhang · John Abanes · Jing Zhang · Chen Feng
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Zhiqiang Shen · Ammar Sherif · Zeyuan Yin · Shitong Shao
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation
Rong Qin · Xingyu Liu · Jinglei Shi · Liang Lin · Jufeng Yang
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy · Alexander Martin · Eugene Yang · Andrew Yates · Kate Sanders · Kenton Murray · Reno Kriz · Celso M. de Melo · Benjamin Van Durme · Rama Chellappa
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio · Vishaal Udandarao · Karsten Roth · Ameya Prabhu · Zeynep Akata · Samuel Albanie · Matthias Bethge
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu · Bolun Zheng · Wei Zhu · Hangjia Pan · Yuchen Yao · Ning Xu · An-An Liu · Quan Zhang · Chenggang Yan
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
Feng Zhou · Ruiyang Liu · chen liu · Gaofeng He · Yonglu Li · Xiaogang Jin · Huamin Wang
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He · Yanning Zhou · Wang Zhao · Zhongkai Wu · Kaiwen Xiao · Yang Wei · Yong-Jin Liu · Xiao Han
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Yang Wu · Yun Zhu · Kaihua Zhang · Jianjun Qian · Jin Xie · Jian Yang
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang · Xuweiyi Chen · Nikhil Madaan · Madhavan Iyengar · Shengyi Qian · David Fouhey · Joyce Chai
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction
Aishwarya Agarwal · Srikrishna Karanam · Vineet Gandhi
Composing Parts for Expressive Object Generation
Harsh Rangwani · Aishwarya Agarwal · Kuldeep Kulkarni · R. Venkatesh Babu · Srikrishna Karanam
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko · Chang Min Park
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Dmitrii M Petrov · Pradyumn Goyal · Divyansh Shivashok · Yuanming Tao · Melinos Averkiou · Evangelos Kalogerakis
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao · Zhengkai Jiang · Fu-Yun Wang · Jiangning Zhang · Hao Chen · Mingmin Chi · Yabiao Wang · Wenhan Luo
End-to-End HOI Reconstruction Transformer with Graph-based Encoding
Zhenrong Wang · Qi Zheng · Sihan Ma · Maosheng Ye · Yibing Zhan · Dongjiang Li
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi · Nacim Belkhir · Dat NGUYEN · Guoxuan Xia · Andrea Pilzer
ββ-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation
Ming Hu · Jianfu Yin · Zhuangzhuang Ma · Jianheng Ma · Feiyu Zhu · Wubingbing · Ya Wen · Meng Wu · C Hu · Bingliang Hu · Quan Wang
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu · Feng Xiong · Meiduo Liu · Jinzhi Zhang · Mingqi Shao · Zhaoxu Sun · Di Wang · Mu Xu
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li · Hongshan Yu · Yihao Ding · Jinhao Qiao · Basim Azam · Naveed Akhtar
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
Basim Azam · Naveed Akhtar
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao · Lujing Xie · Haowei Zhang · Guo Gan · Weiyuan Chen · Yitao Long · Tongyan Hu · Zhijian Xu · Chengye Wang · Chuhan Li · Ziyao Shangguan · Yixin Liu · Zhenwen Liang · Zhiyuan Hu · Chen Zhao · Arman Cohan
Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
Hang Shao · lei luo · Jianjun Qian · Mengkai Yan · Shuo Chen · Jian Yang
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
Yifan Yu · Shaohui Liu · Rémi Pautrat · Marc Pollefeys · Viktor Larsson
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi · Moein Heidari · Clayton Allard · Leonid Sigal · Ilker Hacihaliloglu
3D Student Splatting and Scooping
Jialin Zhu · Jiangbei Yue · Feixiang He · He Wang
Learning Extremely High Density Crowds as Active Matters
Feixiang He · Jiangbei Yue · Jialin Zhu · Armin Seyfried · Dan Casas · Julien Pettré · He Wang
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei · Jielin Jiang · Xiaolong Xu
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Zhanqiang Guo · Jiamin Wu · Yonghao Song · Jiahui Bu · Weijian Mai · Qihao Zheng · Wanli Ouyang · Chunfeng Song
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao · Feng Liu · Yue Liu · Mingxiang Liao · Chen GONG · Qixiang Ye · Fang Wan
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li · Yifei Xing · Xiangyuan Lan · Xin Li · Haifeng Chen · Dongmei Jiang
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
Dominick Reilly · RAJATSUBHRA CHAKRABORTY · A Sinha · Manish Kumar Govind · Pu Wang · Francois Bremond · Le Xue · Srijan Das
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng · Bin Fu · Jin Ye · Guoan Wang · Tianbin Tianbin · Haoyu Wang · Ruoyu Li · He Yao · Chen Junren · Jingwen Li · Yanzhou Su · Min Zhu · Junjun He
Scaling Mesh Generation via Compressive Tokenization
Haohan Weng · Zibo Zhao · Biwen Lei · Xianghui Yang · Jian Liu · Zeqiang Lai · Zhuo Chen · Liu Yuhong · Jie Jiang · Chunchao Guo · Tong Zhang · Shenghua Gao · C.L.Philip Chen
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Jinhui Yi · Syed Talal Wasim · Yanan Luo · Muzammal Naseer · Jürgen Gall
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction
Shuhang Chen · Xianliang Huang · Zhizhou Zhong · Jihong Guan · Shuigeng Zhou
Recreating 1940s Tom and Jerry with Test-Time Training
Jiarui Xu · Shihao Han · Karan Dalal · Daniel Koceja · Xinhao Li · Yue Zhao · Ka Chun Cheung · Yejin Choi · Jan Kautz · Sifei Liu · Yu Sun · Xiaolong Wang
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu · Wengen Li · Jihong Guan · Shuigeng Zhou · Yichao Zhang
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
Dian Shao · Mingfei Shi · Shengda Xu · Harold Haodong Chen · Yongle Huang · Binglu Wang
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen · Tianqi Liu · Huiqiang Sun · Jiaqi Li · Zhiguo Cao · Wei Li · Chen Change Loy
Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor
Hao Yu · Xin Yang · Le Zhang · Hanlin Gu · Tianrui Li · Lixin Fan · Qiang Yang
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao · Jiapeng Wang · Hongliang Li · Chengyu Wang · Jun Huang · Lianwen Jin
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang · Weihong Pan · Chong Bao · Xiyu Zhang · Xiaojun Xiang · Hanqing Jiang · Hujun Bao
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang · Zhixin Zheng · Jiandong Shao · Yule Duan · Liang-Jian Deng
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng · Wenshuo Li · Tong Lin · Xinghao Chen
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu · Chen Jia · Fan Shi · Xu Cheng · Shengyong Chen
AdaDARE-γγ: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
Jingyi Xie · Jintao Yang · Zhunchen Luo · Yunbo Cao · Qiang Gao · Mengyuan Zhang · Wenpeng Hu
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
Kang Chen · Jiyuan Zhang · Zecheng Hao · Yajing Zheng · Tiejun Huang · Zhaofei Yu
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
Saad Wazir · Daeyoung Kim
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Toby Perrett · Ahmad Darkhalil · Saptarshi Sinha · Omar Emara · Sam Pollard · Kranti Kumar Parida · Kaiting Liu · Prajwal Gatti · Siddhant Bansal · Kevin Flanagan · Jacob Chalk · Zhifan Zhu · Rhodri Guerrier · Fahd Abdelazim · Bin Zhu · Davide Moltisanti · Michael Wray · Hazel Doughty · Dima Damen
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan · Dongming Wu · Wencheng Han · Junpeng Jiang · xia zhou · Kun Zhan · Cheng-Zhong Xu · Jianbing Shen
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
Tianyu Chen · Xingcheng Fu · Yisen Gao · Haodong Qian · Yuecen Wei · Kun Yan · Haoyi Zhou · Jianxin Li
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
Xuan Zhu · Jijun Xiang · Xianqi Wang · Longliang Liu · Yu Wang · Hong Zhang · Fei Guo · Xin Yang
MonSter: Marry Monodepth to Stereo Unleashes Power
JunDa Cheng · Longliang Liu · Gangwei Xu · Xianqi Wang · Zhaoxing Zhang · Yong Deng · Jinliang Zang · Yurui Chen · zhipeng cai · Xin Yang
CG-IR: Curved Gaussian Splatting for Inverse Rendering
Hanxiao Sun · Yupeng Gao · Jin Xie · Jian Yang · Beibei Wang
Sketchy Bounding-box Supervision for 3D Instance Segmentation
qian deng · Le Hui · Jin Xie · Jian Yang
RNG: Relightable Neural Gaussians
Jiahui Fan · Fujun Luan · Jian Yang · Milos Hasan · Beibei Wang
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen · Zhiyang Dou · Chen Wang · Yiming Huang · Anjun Chen · Qiao Feng · Jiatao Gu · Lingjie Liu
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin · Haisheng Su · Kai Liu · CONG MA · Wei Wu · Fei HUI · Junchi Yan
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su · Feixiang Song · CONG MA · Wei Wu · Junchi Yan
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu · Xi Guo · Weixuan TANG · Tingxuan Huang · Chiyu Wang · Chenjing Ding
Foveated Instance Segmentation
Hongyi Zeng · Wenxuan Liu · Tianhua Xia · Jinhui Chen · Ziyun Li · Sai Qian Zhang
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai · Zihao Wang · Kewei Lian · Zhancun Mu · Xiaojian Ma · Anji Liu · Yitao Liang
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang · Yu Gao · Dianyi Yang · Liqi Yan · Yufeng Yue · Yi Yang
Universal Actions for Enhanced Embodied Foundation Models
Jinliang Zheng · Jianxiong Li · Dongxiu Liu · Yinan Zheng · Zhihao Wang · Zhonghong Ou · Yu Liu · Jingjing Liu · Ya-Qin Zhang · Xianyuan Zhan
HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery
Yuto Matsubara · Ko Nishino
3DFastEdit: Training-Free Fast and Controllable 3D Editing
Ziya Erkoc · Can Gümeli · Chaoyang Wang · Matthias Nießner · Angela Dai · Peter Wonka · Hsin-Ying Lee · Peiye Zhuang
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
Zexi Jia · Chuanwei Huang · Yeshuang Zhu · Hongyan Fei · Xiaoyue Duan · Yuan Zhiqiang · Ying Deng · Jiapei Zhang · Jinchao Zhang · Jie Zhou
Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset
Minshan XIE · Jian Lin · Hanyuan Liu · Chengze Li · Tien-Tsin Wong
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed · Abdullah Al Arafat · Deniz Najafi · Akhlak Mahmood · Mamshad Nayeem Rizve · Mohaiminul Al Nahian · RANYANG ZHOU · Shaahin Angizi · Adnan Rakin Rakin
Accurate Differential Operators for Hybrid Neural Fields
Aditya Chetan · Guandao Yang · Zichen Wang · Steve Marschner · Bharath Hariharan
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
ruotian peng · Haiying He · Yake Wei · Yandong Wen · Di Hu
AIpparel: A Large Multimodal Generative Model for Digital Garments
Kiyohiro Nakayama · Jan Ackermann · Timur Levent Kesdogan · Yang Zheng · Maria Korosteleva · Olga Sorkine-Hornung · Leonidas Guibas · Guandao Yang · Gordon Wetzstein
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang · Zhendong Wang · Jianmin Bao · Wengang Zhou · Dongdong Chen · Lei Shi · Dong Chen · Houqiang Li
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
Rong Gao · Xin Liu · Zhuozhao Hu · Bohao Xing · Baiqiang XIA · Zitong YU · Heikki Kälviäinen
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki · Anabia Sohail · IYYAKUTTI IYAPPAN GANAPATHI · Basit Alawode · Asim Khan · Sajid Javed · Naoufel Werghi · Mohammed Bennamoun · Arif Mahmood
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression
Xinjie Wang · Yifan Zhang · Ting Liu · Xinpu Liu · Ke Xu · Jianwei Wan · Yulan Guo · Hanyun Wang
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie · Zequn Zeng · Hao Zhang · Yucheng Ding · Yi Wang · Zhengjue Wang · Bo Chen · Hongwei Liu
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai · Xiaodong Cun · Xiaoyu Li · Wenze Liu · Zhaoyang Zhang · Yong Zhang · Ying Shan · Xiangyu Yue
HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
Jongsung Lee · HARIN PARK · Byeong-Uk Lee · Kyungdon Joo
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi · Annie N. Wang · Pieter Peers · Anand Bhattad · Roni Sengupta
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation
Chuandong Liu · Xingxing Weng · Shuguo Jiang · Pengcheng Li · Lei Yu · Gui-Song Xia
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
Lihan Jiang · Kerui Ren · Mulin Yu · Linning Xu · Junting Dong · Tao Lu · Feng Zhao · Dahua Lin · Bo Dai
HVI: A New color space for Low-light Image Enhancement
Qingsen Yan · Yixu Feng · Cheng Zhang · Guansong Pang · Kangbiao Shi · Peng Wu · Wei Dong · Jinqiu Sun · Yanning Zhang
Learning Visual Generative Priors without Text
Shuailei Ma · Kecheng Zheng · Ying Wei · Wei Wu · Fan Lu · Yifei Zhang · Chen-Wei Xie · Biao Gong · Jiapeng Zhu · Yujun Shen
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu · Yuxin Chen · Zhongang Qi · Zeke Xie · Yifan Wang · Lijun Wang · Ying Shan · Huchuan Lu
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou · Jiahao Li · Zunnan Xu · Hanhui Li · Yiji Cheng · Fa-Ting Hong · Qin Lin · qinglin lu · Xiaodan Liang
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Xiaozhong Ji · Xiaobin Hu · Zhihong Xu · Junwei Zhu · Chuming Lin · Qingdong He · Jiangning Zhang · Donghao Luo · Yi Chen · Qin Lin · qinglin lu · Chengjie Wang
ImPortrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu · Zhentao Yu · Zixiang Zhou · Jun Zhou · Xiaoyu Jin · Fa-Ting Hong · Xiaozhong Ji · Junwei Zhu · Chengfei Cai · Shiyu Tang · Qin Lin · Xiu Li · qinglin lu
Less is More: Efficient Image Vectorization with Adaptive Parameterization
Kaibo Zhao · Liang Bao · Yufei Li · Xu Su · Ke Zhang · Xiaotian Qiao
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong · Ziyi Wang · Yuexin Ma · Xiangyu Yue
VideoRepainter: Creative Video Inpainting with Keyframe Reference
Yuwei Guo · Ceyuan Yang · Anyi Rao · Chenlin Meng · Omer Bar-Tal · Shuangrui Ding · Maneesh Agrawala · Dahua Lin · Bo Dai
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv · Yuval Kirstain · Amit Zohar · Shelly Sheynin · Yaniv Taigman · Yossi Adi · Sagie Benaim · Adam Polyak
Keypoints Good for the Two-View Geometry Estimation Problem
Konstantin Pakulev · Alexander Vakhitov · Gonzalo Ferrer
Layered motion fusion: Lifting motion segmentation to 3D in egocentric videos
Vadim Tschernezki · Diane Larlus · Andrea Vedaldi · Iro Laina
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
Jiyuan Liu · Xinwang Liu · chuankun Li · Xinhang Wan · Hao Tan · Yi Zhang · Weixuan Liang · Qian Qu · Yu Feng · Renxiang Guan · KE LIANG
Move-in-2D: 2D-Conditioned Human Motion Generation
Hsin-Ping Huang · Yang Zhou · Jui-Hsien Wang · Difan Liu · Feng Liu · Ming-Hsuan Yang · Zhan Xu
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu · Taesung Park · Richard Zhang · Yang Zhou · Eli Shechtman · Feng Liu · Jia-Bin Huang · Difan Liu
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam · Soowon Son · Zhan Xu · Jing Shi · Difan Liu · Feng Liu · Seungryong Kim · Yang Zhou
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model
Haobo Jiang · Jin Xie · Jian Yang · Liang Yu · Jianmin Zheng
Apollo: An Exploration of Video Understanding in Large Multi-Modal Models
Orr Zohar · Xiaohan Wang · Yann Dubois · Nikhil Mehta · Tong Xiao · Philippe Hansen-Estruch · Licheng Yu · Xiaofang Wang · Felix Juefei-Xu · Ning Zhang · Serena Yeung · Xide Xia
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai · Felix Juefei-Xu · Miao Liu · Xiaoliang Dai · Nikhil Mehta · Chenguang Zhu · Zeyi Huang · James Rehg · Sangmin Lee · Ning Zhang · Tong Xiao
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot · Bogdan Mazoure · Omar Attia · Aleksei Timofeev · Harsh Agrawal · R Devon Hjelm · Zhe Gan · Zsolt Kira · Alexander Toshev
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling
Nannan Li · Kevin Shih · Bryan A. Plummer
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
Weicheng Wang · Guoli Jia · Zhongqi Zhang · Liang Lin · Jufeng Yang
Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation
Lexin Fang · Yunyang Xu · Xiang Ma · Xuemei Li · Caiming Zhang
Seek Common Ground While Reserving Differences: Semi-supervised Image-Text Sentiment Recognition
Wuyou Xia · Guoli Jia · Sicheng Zhao · Jufeng Yang
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors
Changlong Shi · He Zhao · Bingjie Zhang · Mingyuan Zhou · Dandan Guo · Yi Chang
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt learning
Hairui Ren · Fan Tang · He Zhao · Zixuan Wang · Dandan Guo · Yi Chang
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
chaocan xue · Bineng Zhong · Qihua Liang · Yaozong Zheng · Ning Li · Yuanliang Xue · Shuxiang Song
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
Beier Zhu · Beier Zhu · Jiequan Cui · Hanwang Zhang · Chi Zhang
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Zehuan Huang · Yuanchen Guo · Xingqiao An · Yunhan Yang · Yangguang Li · Zi-Xin Zou · Ding Liang · Xihui Liu · Yan-Pei Cao · Lu Sheng
A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models
Keyu Tu · Mengqi Huang · Zhuowei Chen · Zhendong Mao
Multiple Object Tracking as ID Prediction
Ruopeng Gao · Ji Qi · Limin Wang
Efficient stereo depth estimation model for wearable augmented reality devices
Yongfan Liu · Hyoukjun Kwon
Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories
Susung Hong · Johanna Suvi Karras · Ricardo Martin · Ira Kemelmacher-Shlizerman
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
Siyuan Bian · Chenghao Xu · Yuliang Xiu · Artur Grigorev · Zhen Liu · Cewu Lu · Michael J. Black · Yao Feng
One-Step Event-Driven High-Speed Autofocus
Yuhan Bao · Shaohua Gao · Wenyong Li · Kaiwei Wang
PointSR: Self-regularized Point Supervision for Drone-view Object Detection
Weizhuo Li · Yue Xi · Wenjing Jia · zehao zhang · Fei Li · Xiangzeng Liu · Qiguang Miao
Efficient Personalization of Quantized Diffusion Model without Backpropagation
Hoigi Seo · Wongi Jeong · Kyungryeol Lee · Se Young Chun
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu · Mingyu Liu · Zeyu Zhu · Haoen Feng · Xi Xia · Wen Wang · Kevin Qinghong Lin · Chunhua Shen · Mike Zheng Shou
Scaling up Image Segmentation across Data and Tasks
Pei Wang · Zhaowei Cai · Hao Yang · Ashwin Swaminathan · R. Manmatha · Stefano Soatto
Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu · Meng Lou · Yizhou Yu
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Yiping Wang · Xuehai He · Kuan Wang · Luyao Ma · Jianwei Yang · Shuohang Wang · Simon Shaolei Du · yelong shen
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami · Swapneel Sen · Amit K. Roy-Chowdhury · Srikanth Krishnamurthy · Basak Guler
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation
Jingyi Tian · Le Wang · Sanping Zhou · Sen Wang · lijiayi · Haowen Sun · Wei Tang
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
Jikang Cheng · Zhiyuan Yan · Ying Zhang · Li Hao · Jiaxin Ai · Qin Zou · Chen Li · Zhongyuan Wang
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Sen Wang · Le Wang · Sanping Zhou · Jingyi Tian · lijiayi · Haowen Sun · Wei Tang
Two by Two: Learning Cross-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Yu Qi · Yuanchen Ju · Tianming Wei · Chi Chu · Lawson L.S. Wong · Huazhe Xu
iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting
Tuo Cao · Fei LUO · Jiongming Qin · Yu Jiang · Yusen Wang · Chunxia Xiao
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
Kai Mao · Ping Wei · Yiyang Lian · Yangyang Wang · Nanning Zheng
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao · Ping Wei · Shuaijia Chen · Haoxuan Wang · Ziyang Ren
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification
Yunlong Li · Xiabi Liu · Liyuan Pan · Yuchen Ren
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis.
Tianyu Wang · Jianming Zhang · Haitian Zheng · Zhihong Ding · Scott Cohen · Zhe Lin · Wei Xiong · Chi-Wing Fu · Luis Figueroa · Soo Ye Kim
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
Hongyu Li · Jinyu Chen · Ziyu Wei · Shaofei Huang · Tianrui Hui · Jialin Gao · Xiaoming Wei · Si Liu
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun · Weiyu Liu · Siyi Gu · Dylan Lim · Goutam Bhat · Federico Tombari · Manling Li · Nick Haber · Jiajun Wu
Extreme Rotation Estimation in the Wild
Hana Bezalel · Dotan Ankri · Ruojin Cai · Hadar Averbuch-Elor
ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation
Yushan Lai · Guowen Li · Haoyuan Liang · Juepeng Zheng · Zhiyu Ye
Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning
Huu Binh Ta · Duc Nguyen · Quyen Tran · Toan Tran · Tung Pham
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Jianwen Jiang · Gaojie Lin · Zhengkun Rong · Chao Liang · Yongming Zhu · Jiaqi Yang · Tianyun Zhong
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu · Longhao Zhang · Zhengkun Rong · Tianshu Hu · Shuang Liang · Zhipengge
RSVOS-SAM: High-Quality Interactive Segmentation for Remote Sensing Video Object
Zhe Shan · Yang Liu · Lei Zhou · Cheng Yan · Heng Wang · Xia Xie
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui · Xuangeng Chu · Tatsuya Harada
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang · Qian Yang · Aishwarya Agrawal
DFM: Differentiable Feature Matching for Anomaly Detection
Sheng Wu · Yimi Wang · Xudong Liu · Yuguang Yang · Runqi Wang · Guodong Guo · David Doermann · Baochang Zhang
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang · Yukai Shi · Jiarong Ou · Rui Chen · Ke Lin · Jiahao Wang · Boyuan Jiang · Haotian Yang · Mingwu Zheng · Xin Tao · Fei Yang · Pengfei Wan · Di ZHANG
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao · Menglu Wang · King Man Tam · Lingen Li · Tianfan Xue · Jinwei Gu
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin · Yaqi Zhao · Mingwu Zheng · Ke Lin · Jiarong Ou · Rui Chen · Victor Shea-Jay Huang · Jiahao Wang · Xin Tao · Pengfei Wan · Di ZHANG · Baoqun Yin · Wentao Zhang · Kun Gai
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
Nikaan Nikzad · YI LIAO · Yongsheng Gao · Jun Zhou
RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects
Soumyaratna Debnath · Ashish Tiwari · Kaustubh Sadekar · Shanmuganathan Raman
Seurat: From Moving Points to Depth
Seokju Cho · Gabriel Huang · Seungryong Kim · Joon-Young Lee
WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels
Hyeokjun Kweon · Kuk-Jin Yoon
Exploring Temporally-Aware Features for Point Tracking
Inès Hyeonsu Kim · Seokju Cho · Gabriel Huang · Jung Yi · Joon-Young Lee · Seungryong Kim
Building Vision Models upon Heat Conduction
Zhaozhi Wang · Yue Liu · Yunjie Tian · Yunfan Liu · Yaowei Wang · Qixiang Ye
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma · Yaohui Wang · Gengyun Jia · Xinyuan Chen · Tien-Tsin Wong · Yuan-Fang Li · Cunjian Chen
Adaptive Keyframe Sampling for Long Video Understanding
Xi Tang · Jihao Qiu · Lingxi Xie · Yunjie Tian · Jianbin Jiao · Qixiang Ye
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
Chongjian GE · Chenfeng Xu · Yuanfeng Ji · Chensheng Peng · Masayoshi Tomizuka · Ping Luo · Mingyu Ding · Varun Jampani · Wei Zhan
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen · Lin Li · Yongqi Yang · Bin Wen · Fan Yang · Tingting Gao · Yu Wu · Long Chen
Animate and Sound an Image
Xihua Wang · Ruihua Song · Chongxuan Li · Xin Cheng · Boyuan Li · Yihan Wu · Yuyue Wang · Hongteng Xu · Yunfeng Wang
ProReflow: Progressive Reflow with Decomposed Velocity
Lei Ke · Haohang Xu · Xuefei Ning · Yu Li · Jiajun Li · Haoling Li · Yuxuan Lin · Dongsheng Jiang · Yujiu Yang · Linfeng Zhang
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model
Jian Zhu · He Wang · Yang Xu · Zebin Wu · Zhihui Wei
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma · Chenhui Gou · Hengcan Shi · Bin Sun · Shutao Li · Hamid Rezatofighi · Jianfei Cai
Conical Visual Concentration for Efficient Large Vision-Language Models
Long Xing · Qidong Huang · Xiaoyi Dong · Jiajie Lu · Pan Zhang · Yuhang Zang · Yuhang Cao · Conghui He · Jiaqi Wang · Feng Wu · Dahua Lin
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
Jiazi Bu · Pengyang Ling · Pan Zhang · Tong Wu · Xiaoyi Dong · Yuhang Zang · Yuhang Cao · Dahua Lin · Jiaqi Wang
OVBench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Junbo Niu · Yifei Li · Ziyang Miao · Chunjiang Ge · Zhou Yuanhang · Qihao He · Xiaoyi Dong · Haodong Duan · Shuangrui Ding · Rui Qian · Pan Zhang · Yuhang Zang · Yuhang Cao · Conghui He · Jiaqi Wang
CALICO: Multi-Image Pixel-Grounded Object Comparison by Parts with Large Language Models
Kiet A. Nguyen · Adheesh Juvekar · Tianjiao Yu · Muntasir Wahed · Ismini Lourentzou
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng · Xingyu Zhang · Yuming Li · Chenguang Ma
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Yutong Wang · Jiajie Teng · Jiajiong Cao · Yuming Li · Chenguang Ma · Hongteng Xu · Dixin Luo
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao · Bryan Hooi · Jun Liu · Kai-Wei Chang · Zi Huang · Yujun Cai
MC22: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang · Yabo Zhang · Kailai Feng · Xiaohe Wu · Wenbo Li · Renjing Pei · Fan Li · Wangmeng Zuo
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
Shian Du · Menghan Xia · Chang Liu · Xintao Wang · Jing Wang · Pengfei Wan · Di ZHANG · Xiangyang Ji
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan · Shanshan Zhao · Wenjun Yan · Yinglun Li · Qing-Guo Chen · Zhao Xu · Weihua Luo · Kaifu Zhang · Mingming Gong · Gui-Song Xia
A Unified Framework for Heterogeneous Semi-supervised Learning
Marzi Heidari · Abdullah Alchihabi · Hao Yan · Yuhong Guo
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu · Rui Li · Yu Zhu · Rong Guo · Jinqiu Sun · Yanning Zhang
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
Yueru Jia · Jiaming Liu · Sixiang Chen · Chenyang Gu · Zhilve Wang · Xiaoqi Li · Longzan Luo · Pengwei Wang · Renrui Zhang · Zhongyuan Wang · Shanghang Zhang
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji · Huajie Tan · Jiayu Shi · Xiaoshuai Hao · Yuan Zhang · Hengyuan Zhang · Pengwei Wang · Mengdi Zhao · Yao Mu · Pengju An · Xinda Xue · Qinghang Su · Huaihai Lyu · Xiaolong Zheng · Jiaming Liu · Zhongyuan Wang · Shanghang Zhang
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu · Nuno Vasconcelos · Xiaolong Wang
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
Zhipeng Huang · Wangbo Yu · Xinhua Cheng · ChengShu Zhao · Yunyang Ge · Mingyi Guo · Li Yuan · Yonghong Tian
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Trong-Tung Nguyen · Quang Nguyen · Khoi Nguyen · Anh Tran · Cuong Pham
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu · Zhiqiang Gao · Zhiying Zhou · MINGJIE SUN · Yongxin Guo
SLADE: Shielding against Dual Exploits in Large Vision-Language Models
Md Zarif Hossain · AHMED IMTEAJ
Geometry Field Splatting with Gaussian Surfels
Kaiwen Jiang · Venkataram Sivaram · Cheng Peng · Ravi Ramamoorthi
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz · Kate Sanders · David Etter · Kenton Murray · Cameron Carpenter · Hannah Recknor · Jimena Guallar-Blasco · Alexander Martin · Eugene Yang · Benjamin Van Durme
Optical LEGO: An Optical Imaging Dataset and Benchmark at Deeply Subwavelength Resolution
Benquan Wang · Ruyi An · Jin-Kyu So · Sergei Kurdiumov · Eng Aik Chan · Giorgio Adamo · Yuhan Peng · Yewen Li · Bo An
GS-2DGS: Geometrically supervised 2DGS for reflective object reconstruction
Jinguang Tong · Xuesong li · Fahira Afzal Maken · Sundaram Muthu · Lars Petersson · Chuong Nguyen · Hongdong Li
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu · Hao Zhou · Benlei Cui · Wenxiang Shang · Ran Lin
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma · Jiaqi Tang · Bin Guo · Fan Dang · Sicong Liu · Zhui Zhu · Lei Wu · Cheng Fang · Ying-Cong Chen · Zhiwen Yu · Yunhao Liu
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang · Samaneh Azadi · Rohit Girdhar · Sai Saketh Rambhatla · Chen Sun · Xi Yin
Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening
Yinghui Xing · Qu Li Tao · Shizhou Zhang · Di Xu · YingkunYang · Yanning Zhang
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim · Hyunjung Shim
Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation
Qitong Yang · Mingtao Feng · Zijie Wu · Weisheng Dong · Fangfang Wu · Yaonan Wang · Ajmal Mian
Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection
Jinghao Bian · Mingtao Feng · Weisheng Dong · Fangfang Wu · Jianqiao Luo · Yaonan Wang · Guangming Shi
Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu · Xinting Hu · Yuyang Sun · Yizhou Zhou · Wenbo Zhu · Fengyun Rao · Bernt Schiele · Xu Yang
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li · Yongliang Wu · YI LU · Jiashuo Yu · Licheng Tang · Jiawang Cao · Wenqing Zhu · Yuyang Sun · Jay Wu · Wenbo Zhu
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu · Shaokang Wang · Jintao Chen · Zhe Li · Peidong Jia · Fei Zhao · Guoqing Xiang · Zhijian Hao · Shanghang Zhang · Xiaodong Xie
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim · Inyoung Jung · Dayoon Suh · Youjia Zhang · Sangmin Lee · Sungeun Hong
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen · Zizheng Huang · Yan Hong · YANSHUO WANG · Zhongcai Lyu · Zhuoer Xu · Jun Lan · Zhangxuan Gu
Coeff-Tuning: A Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao · WEI CHEN · Qiang Qiu
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho · Kim Youwang · Hunmin Yang · Tae-Hyun Oh
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
Wang Xuan · Xitong Gao · Dongping Liao · Tianrui Qin · Yu-liang Lu · Cheng-Zhong Xu
CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation
Bonan Li · Zicheng Zhang · Xingyi Yang · Xinchao Wang
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um · Jong Chul Ye
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization
Junhao Xu · Yanan Zhang · Zhi Cai · Di Huang
Multimodal Autoregressive Pre-training of Large Vision Encoders
Enrico Fini · Mustafa Shukor · Xiujun Li · Philipp Dufter · Michal Klein · David Haldimann · Sai Aitharaju · Victor Guilherme Turrisi da Costa · Louis Béthune · Zhe Gan · Alexander Toshev · Marcin Eichner · Moin Nabi · Yinfei Yang · Joshua Susskind · Alaaeldin El-Nouby
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee · Jijun jiang · Yiran Xu · Zhuofang Li · Junjie Ke · Yinxiao Li · Junfeng He · Steven Hickson · Katie Datsenko · Sangpil Kim · Ming-Hsuan Yang · Irfan Essa · Feng Yang
S33-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors
Xingyu Ren · Jiankang Deng · Yuhao Cheng · Wenhan Zhu · Yichao Yan · Xiaokang Yang · Stefanos Zafeiriou · Chao Ma
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li · Jianyu Wang · Yuhao Cheng · Yikun Zeng · Xingyu Ren · Wenhan Zhu · Weiming Zhao · Yichao Yan
MEET: Towards Memory-Efficient Temporal Delta-Sigma Deep Neural Networks
Zeqi Zhu · Ibrahim Batuhan Akkaya · Luc Waeijen · Egor Bondarev · Arash Pourtaherian · Orlando Moreira
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Jiang Yan · Hao Yu · Xu Cheng · Haoyu Chen · Zhaodong Sun · Guoying Zhao
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang · Ouxiang Li · Tingting Mu · Yanbin Hao · Kuien Liu · Xiang Wang · Xiangnan He
Towards Source-Free Machine Unlearning
Sk Miraj Ahmed · Umit Basaran · Dripta S. Raychaudhuri · Arindam Dutta · Rohit Kundu · Fahim Faisal Niloy · Basak Guler · Amit K. Roy-Chowdhury
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai · Jie Zhou · Xingjiao Wu · Qin Chen · Qingchun Bai · Zezhou · Liang He
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili · Rohun Agrawal · Yisong Yue · Georgia Gkioxari
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu · Liying Yang · Peike Li · Dadong Wang · Lincheng Li · Xin Yu
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu · Peike Li · Liying Yang · Dadong Wang · Lincheng Li · Xin Yu
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi · Saurabh · Ayush Abhay Shrivastava · Parag Singla · Vibhav Giridhar Gogate
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
Haopeng Sun · Yingwei Zhang · Lumin Xu · Sheng Jin · Ping Luo · Chen Qian · Wentao Liu · Yiqiang Chen
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin · Zilu Guo · Yifan Zhang · Shuaicheng Niu · Yafeng Li · Ruimao Zhang · Shuguang Cui · Zhen Li
UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References
Ming-Feng Li · Xin Yang · Fu-En Wang · Hritam Basak · Yuyin Sun · Shreekant Gayaka · Min Sun · Cheng-Hao Kuo
EASEMVC:Efficient Dual Selection Mechanism for Deep Multi-View Clustering
Baili Xiao · Zhibin Dong · KE LIANG · Suyuan Liu · Siwei Wang · Tianrui Liu · Xingchen Hu · En Zhu · Xinwang Liu
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman · Noam Rotstein · Roy Ganz · Ron Kimmel
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson · Aviad Aberdam · Roy Ganz · Elad Ben Avraham · Alona Golts · Yair Kittenplon · Shai Mazor · Ron Litman
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar · Zeeshan Patel · Jathushan Rajasegaran · Jitendra Malik
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan · Quanwei Yang · Kaisiyuan Wang · Hang Zhou · Yingying Li · Haocheng Feng · Errui Ding · Yu Wu · Jingdong Wang
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan · Kaisiyuan Wang · Zhiliang Xu · Quanwei Yang · Yasheng SUN · Shengyi He · Borong Liang · Yukang Cao · Yingying Li · Haocheng Feng · Errui Ding · Jingdong Wang · Youjian Zhao · Hang Zhou · Ziwei Liu
TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer
Jialun Liu · Jinbo Wu · Xiaobo Gao · JiaKui Hu · Bojun Xiong · Xing Liu · Chen Zhao · Hongbin Pei · Haocheng Feng · Yingying Li · Errui Ding · Jingdong Wang
PIAD: Pose and Illumination agnostic Anomaly Detection
Kaichen Yang · Junjie Cao · Zeyu Bai · Zhixun Su · Andrea Tagliasacchi
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Woojung Han · Yeonkyung Lee · Chanyoung Kim · Kwanghyun Park · Seong Jae Hwang
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion
Haoyu Wang · Le Wang · Sanping Zhou · Jingyi Tian · Zheng Qin · Yabing Wang · Gang Hua · Wei Tang
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi · Sangkyung Kwak · Sihyun Yu · Hyungwon Choi · Jinwoo Shin
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
Enrico Pallotta · Sina Azar Azar · Shuai Li · Olga Zatsarynna · Jürgen Gall
BHViT: Binarized Hybrid Vision Transformer
Tian Gao · Yu Zhang · Zhiyuan Zhang · Huajun Liu · Kaijie Yin · Cheng-Zhong Xu · Hui Kong
Hierarchical Flow Diffusion for Efficient Frame Interpolation
Yang Hai · Guo Wang · Tan Su · jerett · Yinlin Hu
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang · Zhiqiang Lou · Laiyan Ding · Rui Xu · Minglang Tan · jerett · Rui Huang
Dual Semantic Guidence for Open Vocabulary Semantic Segmentation
Wang ZhengYang · Tingliang Feng · Fan Lyu · Fanhua Shang · Wei Feng · Liang Wan
AdaCM22: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man · Ying Huang · Chengming Zhang · Bingzhe Li · Wei Niu · Miao Yin
Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
wenbing zhu · Lidong Wang · Ziqing Zhou · Chengjie Wang · Yurui Pan · Ruoyi.Zhang · Zhuhao Chen · Linjie Cheng · Bin-Bin Gao · Jiangning Zhang · Zhenye Gan · Yuxie Wang · Yulong Chen · Bruce Qian · Mingmin Chi · Bo Peng · Lizhuang Ma
One for More: Conditinual Diffusion Model for Anomaly Detection
Xiaofan Li · Xin Tan · Zhuo Chen · Zhizhong Zhang · Ruixin Zhang · Rizen Guo · Guannan Jiang · Yulong Chen · Yanyun Qu · Lizhuang Ma · Yuan Xie
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs
Wangbo Zhao · Yizeng Han · Jiasheng Tang · Zhikai Li · Yibing Song · Kai Wang · Zhangyang Wang · Yang You
NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics
Chenhao Li · Taishi Ono · Takeshi Uemori · Sho Nitta · Hajime Mihara · Alexander Gatto · Hajime Nagahara · Yusuke Moriuchi
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim · YOUNGKIL SONG · Jicheol Park · Won Hwa Kim · Suha Kwak
SET: Spectral Enhancement for Tiny Object Detection
Huixin Sun · Runqi Wang · Yanjing Li · Linlin Yang · Shaohui Lin · Xianbin Cao · Baochang Zhang
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang · Fan Wu · Linke Ouyang · Zhuangcheng Gu · Rui Zhang · Renqiu Xia · Botian Shi · Bo Zhang · Conghui He
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang · Yuan Qu · Hongbin Zhou · Jiawei Zhu · Rui Zhang · Qunshu Lin · Bin Wang · Zhiyuan Zhao · Man Jiang · Xiaomeng Zhao · Jin Shi · Fan Wu · Pei Chu · Minghao Liu · Zhenxiang Li · Chao Xu · Bo Zhang · Botian Shi · Zhongying Tu · Conghui He
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
Chunyang Cheng · Tianyang Xu · Zhenhua Feng · Xiaojun Wu · Zhangyong Tang · Hui Li · Zhang Zeyang · Sara Atito · Muhammad Awais · Josef Kittler
SVFR: A Unified Framework for Generalized Video Face Restoration
Zhiyao Wang · Xu Chen · Chengming Xu · Junwei Zhu · Xiaobin Hu · Jiangning Zhang · Chengjie Wang · Yuqi Liu · Yiyi Zhou · Rongrong Ji
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian · Jing Han · Chengcheng Wang · Yuchen Liang · Chao Xu · Hanting Chen
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Sagar Soni · Akshay Dudhane · Hiyam Debary · Mustansar Fiaz · Muhammad Akhtar Munir · Muhammad Sohail Danish · Paolo Fraccaro · Campbell D Watson · Levente Klein · Fahad Shahbaz Khan · Salman Khan
Learning on Model Weights using Tree Experts
Eliahu Horwitz · Bar Cavia · Jonathan Kahana · Yedid Hoshen
Panorama Generation From NFoV Image Done Right
Dian Zheng · Cheng Zhang · Xiao-Ming Wu · Cao Li · Chengfei Lv · Jian-Fang Hu · Wei-Shi Zheng
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang · Jianxi Huang · Zhida Sun · Yuanhao Gong · Daniel Cohen-Or · Min Lu
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong · Haoyu Tang · Qinghai Zheng · Mingzhu Xu · Yupeng Hu · Weili Guan
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang · Xiangye Jin · Jiaxu Miao · Yu Wu
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang · Yanzhao Zhang · Wen Xie · Mingxin Li · Ziqi Dai · Dingkun Long · Pengjun Xie · Meishan Zhang · Wenjie Li · Min Zhang
Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
Ruihai Wu · Ziyu Zhu · Yuran Wang · Yue Chen · Jiarui Wang · Hao Dong
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li · Lingyun Xu · Mingxu Zhang · Jiaming Liu · Yan Shen · Iaroslav Ponomarenko · Jiahui Xu · Liang Heng · Siyuan Huang · Shanghang Zhang · Hao Dong
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
Ahyun Seo · Minsu Cho
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang · Lu Chen · Guodong Zheng · Yifeng Gao · Rui Zheng · Jinlan Fu · Zhenfei Yin · Senjie Jin · Yu Qiao · Xuanjing Huang · Feng Zhao · Tao Gui · Jing Shao
Autoregressive Sequential Pretraining for Visual Tracking
Shiyi Liang · Yifan Bai · Yihong Gong · Xing Wei
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li · Shaokun Wang · Bo Qian · Yuhang He · Xing Wei · Qiang Wang · Yihong Gong
Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
Fu Feng · Yucheng Xie · Xu Yang · Jing Wang · Xin Geng
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
Fu Feng · Yucheng Xie · Jing Wang · Xin Geng
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu · Cheng Shi · Dingyou Wang · Jiajin Tang · Zhengxuan Wei · Yu Wu · Guanbin Li · Sibei Yang
SGSST: Scaling Gaussian Splatting Style Transfer
Bruno Galerne · Bruno Galerne · Jianling WANG · Lara Raad · Jean-michel Morel
Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification
Zequn Zeng · Yudi Su · Jianqiao Sun · Tiansheng Wen · Hao Zhang · Zhengjue Wang · Bo Chen · Hongwei Liu · Jiawei Ma
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu · Jie Kong · Jiazhi Wang · Xiao Pan · Bo Lin · Qiang Liu
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception
Yuchen Xia · Quan Yuan · Guiyang Luo · Xiaoyuan Fu · Yang Li · Xuanhan Zhu · Tianyou Luo · Siheng Chen · Jinglin Li
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan · Kepan Nan · Rui Xie · Penghao Zhou · Zhenheng Yang · Chaoyou Fu · Xiang Li · Jian Yang · Ying Tai
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
Bencheng Liao · Shaoyu Chen · haoran yin · Bo Jiang · Cheng Wang · Sixu Yan · xinbang zhang · Xiangyu Li · ying zhang · Qian Zhang · Xinggang Wang
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
Yao Mu · Tianxing Chen · Zanxin Chen · ShijiaPeng · Zhiqian Lan · Zeyu Gao · Zhixuan Liang · Qiaojun Yu · Yude Zou · Mingkun Xu · Lunkai Lin · Zhiqiang Xie · Mingyu Ding · Ping Luo
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
Tianxing Chen · Yao Mu · Zhixuan Liang · Zanxin Chen · ShijiaPeng · Qiangyu Chen · Mingkun Xu · Ruizhen Hu · Hongyuan Zhang · Xuelong Li · Ping Luo
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi · Ali Nazari · Aminreza Sefid · Mohammadali Banayeeanzade · Mohammad Rohban · Mahdieh Baghshah
Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding
Zhaoran Zhao · Peng Lu · Anran Zhang · Pei Pei Li · Xia Li · Xuannan Liu · Yang Hu · Shiyi Chen · liweiwang · Wenhao Guo
Dragin3D: Image Editing by Dragging in 3D Space
Weiran Guang · Xiaoguang Gu · Mengqi Huang · Zhendong Mao
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection
Yunshuang Yuan · Yan Xia · Daniel Cremers · Monika Sester
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
Shangjin Zhai · Zhichao Ye · Jialin Liu · Weijian Xie · Jiaqi Hu · Zhen Peng · Hua Xue · Danpeng Chen · Xiaomeng Wang · Lei Yang · Nan Wang · Haomin Liu · Guofeng Zhang
ReRAW: RAW-from-RGB Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan · Beril Besbinar · Christoph Reinders · Junji Otsuka · Daisuke Iso
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu · Peng Cui · Bingning Wang · Weipeng Chen · Yupeng Zhang · Jun Zhu · Xiaolin Hu
Efficient Motion-Aware Video MLLM
Zijia Zhao · Yuqi Huo · Tongtian Yue · Longteng Guo · Haoyu Lu · Bingning Wang · Weipeng Chen · Jing Liu
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao · Yuanyang Yin · Lin Li · Mingan Lin · Victor Shea-Jay Huang · Siwei Chen · Weipeng Chen · Baoqun Yin · Zenan Zhou · Wentao Zhang
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He · Kai Tong · Di Fang · Han Sun · Ziqian Zeng · Haoran Li · Tianyi Chen · Huiping Zhuang
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang · Zekai Li · Zhi-Qi Cheng · Samir Khaki · Ahmad Sajedi · Ramakrishna Vedantam · Konstantinos N. Plataniotis · Alexander G. Hauptmann · Yang You
CamPoint: Boosting Point Cloud Segmentation with Virtual Camera
Jianhui Zhang · Luo Yizhi · Zicheng Zhang · Xuecheng Nie · Bonan Li
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin · Zheng Wang · Tianwen Qian · Pan Mu · Sixian Chan · Cong Bai
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
Yi Yu · Weizhen Han · Libing Wu · Bingyi Liu · Enshu Wang · Zhuangzhuang Zhang
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao · Xuxin Cheng · Zhiqi Huang · Lei Li
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu · Wenshuo Wang · Yicheng Qiao · Qiannan Guo · Jiayin Zhu · Pengfei Li · Zilong Chen · Huiming Yang · Zhiwei Li · Lening Wang · Tiao Tan · Huaping Liu
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang · Hehe Fan · Yi Yang
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
Chao Wang · Hehe Fan · Huichen Yang · Sarvnaz Karimi · Lina Yao · Yi Yang
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang · Yen-Siang Wu · Hung-Kai Chung · Kai-Po Chang · Fu-En Yang · Yu-Chiang Frank Wang
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
Fangyu Wu · Yuhao Chen
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Jianhao Zheng · Zihan Zhu · Valentin Bieri · Marc Pollefeys · Songyou Peng · Iro Armeni
Adaptive Parameter Selection for Tuning Vision-Language Models
Yi Zhang · Yi-Xuan Deng · Meng-Hao Guo · Shi-Min Hu
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant · Ethan Weber · Jin Kyu Kim · Rawal Khirodkar · Zhaoen Su · Julieta Martinez · Igor Gilitschenski · Shunsuke Saito · Timur Bagautdinov
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang · Haitao Lin · Tianyu Wang · Yanwei Fu · Xiangyang Xue · Yi Zhu
Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion
Jongseong Bae · Junwoo Ha · Ha Young Kim
Maintaining Consistent Inter-Class Topology in Continual Test-Time Adaptation
Chenggong Ni · Fan Lyu · Jiayao Tan · Fuyuan Hu · Rui Yao · Tao Zhou
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu · Shih-Hong Chen · Chih Yao Hu · Hsin-Yu Wu · Kai-Hsin Chen · Yu-You Chen · Chih-Hai Su · Chih-Kuo Lee · Yu-Lun Liu
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Ziyi Wang · Yanran Zhang · Jie Zhou · Jiwen Lu
Temporally Consistent Object-Centric Learning by Contrasting Slots
Anna Manasyan · Maximilian Seitzer · Filip Radovic · Georg Martius · Andrii Zadaianchuk
Dense-To-Sparse Video Diffusion For High-fidelity Multi-View Images Synthesis
Fan Yang · Jianfeng Zhang · Jun Hao Liew · Chaoyue Song · Zhongcong Xu · Xiu Li · Jiashi Feng · Guosheng Lin
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang · Zeyi Huang · Xiangjie Kong · Guojiang Shen · Guang Dai · Jingdong Wang · Yong Liu
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song · Jianfeng Zhang · Xiu Li · Fan Yang · Yiwen Chen · Zhongcong Xu · Jun Hao Liew · Xiaoyang Guo · Fayao Liu · Jiashi Feng · Guosheng Lin
DAR: Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang · Jintai Chen · Jian liu · Dongqi Tang · Wentong Li · Weiqiang Wang · Danny Chen · Jian Wu
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li · Yuenan Hou · Xiaohan Xing · Yuexin Ma · Xiao Sun · Yanyong Zhang
GBC: Generalizable Gaussian-Based Clothed Human Digitalization
Hanzhang Tu · Zhanfeng Liao · Boyao Zhou · Shunyuan Zheng · Xilong Zhou · Liuxin ZHANG · QianYing Wang · Yebin Liu
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
Dingcheng Zhen · Shunshun Yin · Shiyang Qin · Hou Yi · Ziwei Zhang · Siyuan Liu · Gan Qi · Ming Tao
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang · Guodong Wang · yizhou jin · Di Huang
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera
Jian Huang · Chengrui Dong · Xuanhua Chen · Peidong Liu
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang · Fangfu Liu · Jiawei Chi · Yueqi Duan
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang · Jinzhao Li · Xin Fei · Hao Liu · Yueqi Duan
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang · Guoyu Yang · Yuan Wang · Daming Shi · Yanzhong Wang
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection
Fan Xing · Zhuo Tian · Xuefeng Fan · Xiaoyi Zhou
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
Mingze Sun · Junting Dong · Junhao Chen · Yurun Chen · Xinyu Jiang · Shiwei Mao · Puhua Jiang · Jingbo Wang · Bo Dai · Ruqi Huang
MotionMap: Representing Multimodality in Human Pose Forecasting
Reyhaneh Hosseininejad · Megh Shukla · Saeed Saadatnejad · Mathieu Salzmann · Alex Alahi
Certified Human Trajectory Prediction
Mohammadhossein Bahari · Saeed Saadatnejad · Amirhossein Askari Farsangi · Seyed-Mohsen Moosavi-Dezfooli · Alex Alahi
Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model
Longrong Yang · Dong Shen · Chaoxiang Cai · Kaibing Chen · Fan Yang · Tingting Gao · Di ZHANG · Xi Li
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
Guannan Lai · Yujie Li · Xiangkun Wang · Junbo Zhang · Tianrui Li · Xin Yang
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang · Daqing Liu · Xinchen Liu · Xiaodong He
Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret
Yucong Dai · Shilin Gu · Ruidong Fan · Chao Xu · Chenping Hou
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou · Quan Sun · Yuang Peng · Kun Yan · Runpei Dong · Duomin Wang · Zheng Ge · Nan Duan · Xiangyu Zhang
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and M"obius Spatial Augmentation
Zidong Cao · Jinjing Zhu · Weiming Zhang · Hao Ai · Haotian Bai · Hengshuang Zhao · Lin Wang
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren · Kangrui Chen · Yingqian Cui · Shenglai Zeng · Hui Liu · Yue Xing · Jiliang Tang · Lingjuan Lyu
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
Jianchuan Chen · Jingchuan Hu · Gaige Wang · Zhonghua Jiang · Tiansong Zhou · Zhiwen Chen · Chengfei Lv
Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video
Marchellus Matthew · Nadhira Noor · In Kyu Park
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira · Gabriele Facciolo · Thibaud Ehret
Rethinking Correspondence-based Category-Level Object Pose Estimation
Huan Ren · Wenfei Yang · Shifeng Zhang · Tianzhu Zhang
Structure-Aware Correspondence Learning for Relative Pose Estimation
Yihan Chen · Wenfei Yang · Huan Ren · Shifeng Zhang · Tianzhu Zhang · Feng Wu
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Pu Cao · Feng Zhou · Lu Yang · TianruiHuang · Qing Song
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao · Shen Sang · Tiancheng Zhi · Jing Liu · Qing Yan · Linjie Luo · Bo Yuan
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang · Tiancheng Zhi · Jing Liu · Shen Sang · Liming Jiang · Qing Yan · Sijia Liu · Linjie Luo
Implicit Correspondence Learning for Image-to-Point Cloud Registration
Xinjun Li · Wenfei Yang · Jiacheng Deng · Zhixin Cheng · Xu Zhou · Tianzhu Zhang
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
Shaofei Huang · Rui Ling · Tianrui Hui · Hongyu Li · Xu Zhou · Shifeng Zhang · Si Liu · Richang Hong · Meng Wang
Generative Map Priors for Collaborative BEV Semantic Segmentation
Jiahui Fu · Yue Gong · Luting Wang · Shifeng Zhang · Xu Zhou · Si Liu
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Yiheng Li · Yang Yang · Zichang Tan · Huan Liu · Weihua Chen · Xu Zhou · Zhen Lei
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
ZHANG LINTONG · Kang Yin · Seong-Whan Lee
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Songhao Han · Wei Huang · Hairong Shi · Le Zhuo · Xiu Su · Shifeng Zhang · Xu Zhou · Xiaojuan Qi · Yue Liao · Si Liu
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
Yi Du · Zhipeng Zhao · Shaoshu Su · Sharath Golluri · Haoze Zheng · Runmao Yao · Chen Wang
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
Zhao Dong · Ka chen · Zhaoyang Lv · Hong-Xing Yu · Yunzhi Zhang · Cheng Zhang · Yufeng Zhu · Stephen Tian · Zhengqin Li · Geordie Moffatt · Sean Christofferson · James Fort · Xiaqing Pan · Mingfei Yan · Jiajun Wu · Carl Ren · Richard Newcombe
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang · Tong Zhang · Yuanzhi Wang · Yide Qiu · Xin Liu · Xu Guo · Zhen Cui
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou · Lvchang Fu · Sida Peng · Yunzhi Yan · Zhanhua Zhang · chen yong · Jiazhi Xia · Xiaowei Zhou
Large Inverse Rendering Model for Reconstruction of Shape, Materials and Realistic Radiance Field
Zhengqin Li · Dilin Wang · Ka chen · Zhaoyang Lv · Thu Nguyen-Phuoc · Milim Lee · Jia-Bin Huang · Lei Xiao · Yufeng Zhu · Carl Marshall · Carl Ren · Richard Newcombe · Zhao Dong
FreeTimeGS: Free Gaussians at Anytime Anywhere for Dynamic Scene Reconstruction
Yifan Wang · Peishan Yang · Zhen Xu · Jiaming Sun · Zhanhua Zhang · chen yong · Hujun Bao · Sida Peng · Xiaowei Zhou
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang · Haoxuan Li · Chunyuan Zheng · Lei Wang · Guorui Liao · Zhili Gong · Huayi Yang · Li Liu
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu · Zhengyang Liang · Yueze Wang · Xianfeng Wu · feilong tang · Muyang He · Jian Li · Zheng Liu · Harry Yang · Ser-Nam Lim · Bo Zhao
OmniGen: Unified Image Generation
Shitao Xiao · Yueze Wang · Junjie Zhou · Huaying Yuan · Xingrun Xing · Ruiran Yan · Chaofan Li · Shuting Wang · Tiejun Huang · Zheng Liu
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou · Yan Shu · Bo Zhao · Boya Wu · Zhengyang Liang · Shitao Xiao · Minghao Qin · Xi Yang · yongping xiong · Bo Zhang · Tiejun Huang · Zheng Liu
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang · Shuo Wang · Jiangning Zhang · Ke Fan · Jiafu Wu · Xuezhucun · Yong Liu
Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation
Yiftach Edelstein · Or Patashnik · Dana Cohen-Bar · Lihi Zelnik-Manor
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu · Jiangning Zhang · Ran Yi · Jieyu Weng · Yabiao Wang · Xianfang Zeng · Xuezhucun · Lizhuang Ma
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model
Yue Han · Jiangning Zhang · Junwei Zhu · Runze Hou · Xiaozhong Ji · Chuming Lin · Xiaobin Hu · Xuezhucun · Yong Liu
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
Fengyi Fu · Lei Zhang · Mengqi Huang · Zhendong Mao
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian · Zhaoyang Liu · Ruibin Yuan · Jiahao Pan · Qifeng Liu · Xu Tan · Qifeng Chen · Wei Xue · Yike Guo
DiffDNO: Diffusion Fourier Neural Operator
Xiaoyi Liu · Hao Tang
Not Just Text: Uncovering Vision Modality Threats in Image Generation Models
Hao Cheng · Erjia Xiao · Jiayan Yang · Jiahang Cao · Qiang Zhang · Jize Zhang · Kaidi Xu · Jindong Gu · Renjing Xu
GenVDM: Generating Vector Displacement Maps From a Single Image
Yuezhi Yang · Qimin Chen · Vladimir G. Kim · Siddhartha Chaudhuri · Qixing Huang · Zhiqin Chen
3D Dental Model Segmentation with Geometrical Boundary Preserving
Shufan Xi · Zexian Liu · Junlin Chang · Hongyu Wu · Xiaogang Wang · Aimin Hao
Beyond Human Perception: Understanding Multi-Object World from Monocular View
Keyu Guo · yongle huang · Shijie Sun · Xiangyu Song · Mingtao Feng · Zedong Liu · Huansheng Song · Tiantian Wang · Jianxin Li · Naveed Akhtar · Ajmal Mian
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Guoyu Lu
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
Hongkai Wei · YANG YANG · Shijie Sun · Mingtao Feng · Xiangyu Song · Qi Lei · Hongli Hu · Rong Wang · Huansheng Song · Naveed Akhtar · Ajmal Mian
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang · Tianyi Gao · Haoran Xu · Qihao Zhao · Yang Song · Zhipeng Gui · Tengchao Lv · Hao Chen · Lei Cui · Scarlett Li · Furu Wei
Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
Xingyu Qiu · Mengying Yang · Xinghua Ma · Fanding Li · Dong Liang · Gongning Luo · wei wang · Kuanquan Wang · Shuo Li
A Dataset for Semantic Segmentation in the Presence of Unknowns
Zakaria Laskar · Tomas Vojir · Matej Grcic · Iaroslav Melekhov · Shankar Gangisetty · Juho Kannala · Jiri Matas · Giorgos Tolias · C.V. Jawahar
ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos · Vladan Stojnić · Anna Manko · Pavel Suma · Nikolaos-Antonios Ypsilantis · Nikos Efthymiadis · Zakaria Laskar · Jiri Matas · Ondrej Chum · Giorgos Tolias
ActiveGAMER: Active GAussian Mapping through Efficient Rendering
Liyan Chen · Huangying Zhan · Kevin Chen · Xiangyu Xu · Qingan Yan · Changjiang Cai · Yi Xu
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi · Peihao Chen · Junyan Li · Shuailei Ma · Xinyu Sun · Tianhang Xiang · Yinjie Lei · Mingkui Tan · Chuang Gan
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
Yutao Tang · Yuxiang Guo · Deming Li · Cheng Peng
LMO: Linear Mamba Operator for MRI Reconstruction
Wei Li · jiawei jiang · Jie Wu · Kaihao Yu · Jianwei Zheng
MITracker: Multi-View Integration for Visual Object Tracking
Mengjie Xu · Yitao Zhu · Haotian Jiang · Jiaming Li · Zhenrong Shen · Sheng Wang · Haolin Huang · Xinyu Wang · Han Zhang · Qing Yang · Qian Wang
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
Van-Tin Luu · Yong-Lin Cai · Vu-Hoang Tran · Wei-Chen Chiu · Yi-Ting Chen · Ching-Chun Huang
RGBAvatar: Reduced Gaussian Blendshapes for Head Avatar Animation
Linzhou Li · Yumeng Li · Yanlin Weng · Youyi Zheng · Kun Zhou
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang · Weilong Dai · Jinlong Liu · Wanggui He · Hao Jiang · Mingli Song · Jie Song
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
wenxin ma · Xu Zhang · Qingsong Yao · Fenghe Tang · Chenxu Wu · Yingtai Li · Rui Yan · Zihang Jiang · S Kevin Zhou
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan · Jiawen Shi · Pan Zhou · Neil Zhenqiang Gong · Lichao Sun
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li · Henry Li · Yichun Shi · Amir Barati Farimani · Yuval Kluger · Linjie Yang · Peng Wang
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di · Long Peng · Peizhe Xia · Wenbo Li · Renjing Pei · Yang Wang · Yang Cao · Zheng-Jun Zha
STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds
Zikuan Li · Honghua Chen · Yuecheng Wang · Sibo Wu · Mingqiang Wei · Jun Wang
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Duc-Hai Pham · Tung Do · Phong Nguyen · Binh-Son Hua · Khoi Nguyen · Rang Nguyen
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Kushwaha Kushwaha · Yapeng Tian
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou · Junbin Xiao · Qingyun Li · Yicong Li · Xun Yang · Dan Guo · Meng Wang · Tat-seng Chua · Angela Yao
Towards Context-Stable and Hue-Consistent Image Inpainting
Yikai Wang · Chenjie Cao · Junqiu Yu · Ke Fan · Xiangyang Xue · Yanwei Fu
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu · Zeyi Huang · Fumio Okura · Yasuyuki Matsushita
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Tong Wang · Ting Liu · Xiaochao Qu · WU CHENGJING · Xiaochao Qu · Xiaolin Hu
MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting
jun huang · Ting Liu · Yihang Wu · Xiaochao Qu · Xiaochao Qu · Xiaolin Hu
NTClick: Achieving Precise Interactive Segmentation With Noise-tolerant Clicks
Chenyi Zhang · Ting Liu · Xiaochao Qu · Xiaochao Qu · Yao Zhao · Yunchao Wei
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
Jiahe Li · Feiyu Wang · Xiaochao Qu · WU CHENGJING · Xiaochao Qu · Ting Liu
SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
Chongkai Yu · Ting Liu · Li Anqi · Xiaochao Qu · WU CHENGJING · Xiaochao Qu · Xiaolin Hu
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection
Ming Sun · Rui Wang · Zixuan Zhu · Lihua Jing · Yuanfang Guo
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang · hongzhen wang · Zonghao Guo · Di Wang · Yulin Wang · Mingshuo Chen · Qiang Ma · Long Lan · Wenjing Yang · Jing Zhang · Zhiyuan Liu · Maosong Sun
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie · Han Zou · Ruiqi Yu · Yan Zhang · Zhan Zhenpeng
Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions
Tianhao Ma · Han Chen · Juncheng Hu · Yungang Zhu · Ximing Li
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu · Keyang Xu · Bo Liu · Hongliang Fei · Qiang Liu
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He · Weiran Wang · Guancheng Quan · Shihao Wang · Dajiang Zhou · Yunsong Li
FedCS: Coreset Selection for Federated Learning
Chenhe Hao · Weiying Xie · Daixun Li · Haonan Qin · Hangyu Ye · Leyuan Fang · Yunsong Li
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Hmrishav Bandyopadhyay · Yi-Zhe Song
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen · Hmrishav Bandyopadhyay · Kai Zou · Yi-Zhe Song
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley · Tapas Kumar Dutta · Aneeshan Sain · Pinaki Nath Chowdhury · Ayan Kumar Bhunia · Yi-Zhe Song
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain · Subhajit Maity · Pinaki Nath Chowdhury · Subhadeep Koley · Ayan Kumar Bhunia · Yi-Zhe Song
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
Zhiyu Qu · Yunqi Miao · Zhensong Zhang · Jifei Song · Jiankang Deng · Yi-Zhe Song
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
Xin Lin · Chong Shi · Zuopeng Yang · Haojin Tang · Zhili Zhou
MVBoost: Boost 3D Reconstruction with Multi-View Refinement
Xiangyu Liu · Xiaomei Zhang · Zhiyuan Ma · Xiangyu Zhu · Zhen Lei
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation
Libiao Chen · Dong Nie · Junjun Pan · Jing Yan · Zhenyu Tang
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang · Liang Li · Chenggang Yan · Chunshan Liu · Anton van den Hengel · Yuankai Qi
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen · Lin Gu · Liang Li · Chenggang Yan · Ying Fu
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong · Jiadong Pan · Liang Li · Yuankai Qi · Yuxin Peng · Anton van den Hengel · Jian Yang · Qingming Huang
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
Shanshan Huang · Haoxuan Li · Chunyuan Zheng · Mingyuan Ge · WeiGao · Lei Wang · Li Liu
MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
Yanyuan Chen · Dexuan Xu · Yu Huang · ZhanSongkun · Hanpin Wang · Dongxue Chen · Xueping Wang · Meikang Qiu · Hang Li
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
Weichen Dai · wu hexing · xiaoyang weng · Yuxin Zheng · Yuhang Ming · Wanzeng Kong
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
Luyuan Xie · Tianyu Luan · Wenyuan Cai · Guochen Yan · Zhaoyu Chen · Nan Xi · Yuejian Fang · Qingni Shen · Zhonghai Wu · Junsong Yuan
Population Normalization for Federated Learning
Zhuoyao Wang · Fan Yi · Peizhu Gong · Caitou He · Cheng Jin · Weizhong Zhang
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang · Lianghua Huang · Jingwu Fang · Huanzhang Dou · Wei Wang · Zhi-Fan Wu · Yupeng Shi · Junge Zhang · Xin Zhao · Yu Liu
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Hui Li · Mingwang Xu · Qingkun Su · Shan Mu · Li jiaye · Kaihui Cheng · Chen Yuxuan · Tan Chen · Mao Ye · Jingdong Wang · Siyu Zhu
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou · Manli Tao · Chaoyang Zhao · Haiyun Guo · Honghui Dong · Ming Tang · Jinqiao Wang
No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
Rong Qin · Xin Liu · Xingyu Liu · Jiaxuan Liu · Jinglei Shi · Liang Lin · Jufeng Yang
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Yikun Liu · Yajie Zhang · jiayin cai · Xiaolong Jiang · Yao Hu · Jiangchao Yao · Yanfeng Wang · Weidi Xie
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen · Weize Ma · Jing Liu · Changdi Yang · Rui Ding · Quanyi Wang · Henghui Ding · Wei Niu · Yanzhi Wang · Pu Zhao · Jun Lin · Jiuxiang Gu
Hybrid Explicit Representation for Ultra-Realistic Head Avatars
Hongrui Cai · Yuting Xiao · Xuan Wang · Jiafei Li · Yudong Guo · Yanbo Fan · Shenghua Gao · Juyong Zhang
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu · Xiaokang Chen · Zhiyu Wu · Yiyang Ma · Xingchao Liu · Zizheng Pan · Wen Liu · Zhenda Xie · Xingkai Yu · Chong Ruan · Ping Luo
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Yiyang Ma · Xingchao Liu · Xiaokang Chen · Wen Liu · Chengyue Wu · Zhiyu Wu · Zizheng Pan · Zhenda Xie · Haowei Zhang · Xingkai Yu · Liang Zhao · Yisong Wang · Jiaying Liu · Chong Ruan
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng · Guole Shen · Chen Xun · Shenghai Yuan · Tongxing Jin · Hongming Shen · Yanbo Wang · Jingchuan Wang · Hesheng Wang · Danwei Wang · Weidong Chen
ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector
Yuanwei Liu · Hui Wei · Chengyu Jia · Ruqi Xiao · Weijian Ruan · Xingxing Wei · Joey Tianyi Zhou · Zheng Wang
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang · Jiazhao Zhang · Yuqing Lan · Yulan Guo · Dezun Dong · Chenyang Zhu · Kai Xu
BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment
Runmin Jiang · Jackson Daggett · Shriya Pingulkar · Yizhou Zhao · Priyanshu Dhingra · Daniel Brown · Qifeng Wu · Xiangrui Zeng · Xingjian Li · Min Xu
Matrix-Free Shared Intrinsics Bundle Adjustment
Daniel Safari
Structure from Collision
Takuhiro Kaneko
DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region
Jianping Wu
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
Won Jun Kim · Hyungjin Chung · Jaemin Kim · Sangmin Lee · Byeongsu Sim · Jong Chul Ye
Neural Hierarchical Decomposition for Single Image Plant Modeling
Zhihao Liu · Zhanglin Cheng · Naoto Yokoya
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
Yuanhao Zou · Zhaozheng Yin
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Weixi Feng · Chao Liu · Sifei Liu · William Yang Wang · Arash Vahdat · Weili Nie
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai · Hanzhuo Huang · Yu Wu · Sibei Yang
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
Hang Zhou · Xinxin Zuo · Rui Ma · Li Cheng
Incremental Object Keypoint Learning
Mingfu Liang · Jiahuan Zhou · Xu Zou · Ying Wu
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
Ziming Huang · Xurui Li · Haotian Liu · Feng Xue · Yuzhe Wang · Yu Zhou
Auto-Enocded Supervision for Perceptual Image Super-Resolution
MinKyu Lee · Sangeek Hyun · Woojin Jun · Jae-Pil Heo
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
Yousef Yeganeh · Ioannis Charisiadis · Marta Hasny · Martin Hartenberger · Björn Ommer · Nassir Navab · Azade Farshad · Ehsan Adeli
Adapting Dense Matching for Homography Estimation with Grid-based Acceleration
Kaining Zhang · Yuxin Deng · Jiayi Ma · Paolo Favaro
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang · Haoran Xu · Yong Liu · Jiaze Li · Yansong Tang
MATCHA: Towards Matching Anything
Fei Xue · Sven Elflein · Laura Leal-Taixe · Qunjie Zhou
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs
Youyi Zhan · Tianjia Shao · Yin Yang · Kun Zhou
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song · Valts Blukis · Jonathan Tremblay · Stephen Tyree · Yu Su · Stan Birchfield
HUNet: Homotopy Unfolding Network for Image Compressive Sensing
Feiyang Shen · Hongping Gan
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall · Cheng Perng Phoo · Mia Chiquier · Bharath Hariharan · Kavita Bala · Carl Vondrick
CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning
Xiaokun Li · Yaping Huang · Qingji Guan
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond
Kun Zhou · Xinyu Lin · Jiangbo Lu
GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration
Sudarshan Rajagopalan · Nithin Gopalakrishnan Nair · Jay Paranjape · Vishal M. Patel
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
Sara Al-Emadi · Yin Yang · Ferda Ofli
Mamba as a Bridge: Where VFM Meets VLM for Domain-Generalized Semantic Segmentation
Xin Zhang · Robby T. Tan
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li · Puhao Li · Tengyu Liu · Yuyang Li · Siyuan Huang
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Separation
Tianfu Wang · Mingyang Xie · Haoming Cai · Sachin Shah · Christopher Metzler
One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
Senmao Li · Lei Wang · Kai Wang · Tao Liu · Jiehang Xie · Joost van de Weijer · Fahad Shahbaz Khan · Shiqi Yang · Yaxing Wang · Jian Yang
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi · Shubhendu Jena · Eric Marchand · Adnane Boukhayma
PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models
Minghao Chen · Roman Shapovalov · Iro Laina · Tom Monnier · Jianyuan Wang · David Novotny · Andrea Vedaldi
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen · Xiaojie Xu · Wenbo Li · Jingjing Ren · Tian Ye · Songhua Liu · Ying-Cong Chen · Lei Zhu · Xinchao Wang
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong · Tianheng Zheng · Yuanzhouhan Cao · Linbo Qing · Chao Ren
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee · Gim Hee Lee
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation
Md Mostafijur Rahman · Radu Marculescu
Customized Condition Controllable Generation for Video Soundtrack
Fan Qi · KunSheng Ma · Changsheng Xu
Hearing Anywhere in Any Environment
Xiulong Liu · Anurag Kumar · Paul Calamia · Sebastia Vicenc Amengual Gari · Calvin Murdock · Ishwarya Ananthabhotla · Philip W Robinson · Eli Shlizerman · Vamsi Krishna Ithapu · Ruohan Gao
ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng · Chang Yu · Zoubin Bi · Yintong Shang · Feng Gao · Hongzhi Wu · Kun Zhou · Chenfanfu Jiang · Yin Yang
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi · Yangyang Huang · Ronghua Luo · YuQiu
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang · Yueqian Wang · Bo Chen · Tong Wu · Dongyan Zhao · Zilong Zheng
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla · Sethuraman T V · Alexander G. Schwing · Derek Hoiem
SciBench: Addressing Scientific Illusions in Image Synthesis
Jialuo Li · Wenhao Chai · XINGYU FU · Haiyang Xu · Saining Xie
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
Zhaochen Liu · Limeng Qiao · Xiangxiang Chu · Lin Ma · Tingting Jiang
Turbo3D: Ultra-fast Text-to-3D Generation
Hanzhe Hu · Tianwei Yin · Fujun Luan · Yiwei Hu · Hao Tan · Zexiang Xu · Sai Bi · Shubham Tulsiani · Kai Zhang
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li · Tao Yang · Song Guo · Lei Zhang
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin’ya Yamaguchi · Dewei Feng · Sekitoshi Kanai · Kazuki Adachi · Daiki Chijiwa
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau · Guillaume Bourmaud
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi · Lu Qi · Jianzong Wu · Jinbin Bai · Jingbo Wang · Yunhai Tong · Xiangtai Li
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Xuesong Chen · Linjiang Huang · Tao Ma · Rongyao Fang · Shaoshuai Shi · Hongsheng Li
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh · Tushar Nagarajan · Georgios Pavlakos · Kris Kitani · Kristen Grauman
SEC-Prompt:SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning
Ye Liu · Meng Yang
An Image-like Diffusion Method for Human-Object Interaction Detection
Xiaofei Hui · Haoxuan Qu · Hossein Rahmani · Jun Liu
V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy
Jiayin Zhao · Zhenqi Fu · Tao Yu · Hui Qiao
Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration
Lizheng Zu · Lin Lin · Song Fu · Na Zhao · Pan Zhou
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin · Jiao-Long Cao · Ming-Ming Cheng · Qibin Hou
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
Yizheng Xie · Viktoria Ehm · Paul Roetzer · Nafie El Amrani · Maolin Gao · Florian Bernard · Daniel Cremers
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang · Changxing Ding · Wentao Tan · Junhong Wang · JIN Tao · Xiangmin Xu
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng · 湘铭 王 · Yongyong Chen · Jingyong Su · Jie Liu
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures using Phase-Transferred Diffusion Model
Xiang Gao · Shuai Yang · Jiaying Liu
DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
Zhixuan Liang · Yao Mu · Yixiao Wang · Fei Ni · Tianxing Chen · Wenqi Shao · Wei Zhan · Masayoshi Tomizuka · Ping Luo · Mingyu Ding
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona · Andrei Zanfir · Eduard Gabriel Bazavan · NIKOS KOLOTOUROS · Thiemo Alldieck · Cristian Sminchisescu
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li · Liang Wang · Chao Wang · Jing Jiang · Yan Peng · Guodong Long
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing · Juncheng Hu · Guotao Liang · Jing Zhang · Dong Xu · Qian Yu
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu · Seongyu Kim · Joon Chung · Arda Senocak
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Wenbo Hu · Xiangjun Gao · Xiaoyu Li · Sijie Zhao · Xiaodong Cun · Yong Zhang · Long Quan · Ying Shan
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata Casademunt · Michał Stypułkowski · Rodrigo Mira · Stella Bounareli · Konstantinos Vougioukas · Zoe Landgraf · Nikita Drobyshev · Maciej Zieba · Stavros Petridis · Maja Pantic
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li · BoZhong Zheng · Xiaohao Xu · Jinye Gan · Fading Lu · Xiang Li · Na Ni · Zheng Tian · Xiaonan Huang · Shenghua Gao · Yingna Wu
EnliveningGS: Active Locomotion of 3DGS
Siyuan Shen · Tianjia Shao · Kun Zhou · Chenfanfu Jiang · Yin Yang
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh · Jan Kautz
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu · Xinchao Wang
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie · Qile He · Youjia Zhu · Qiwei He · Mengtian Li
NN-Former: Rethinking Graph Structure in Neural Architecture Representation
Ruihan Xu · Haokui Zhang · Yaowei Wang · Wei Zeng · Shiliang Zhang
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu · Haoyu Wu · Zheng Ziqiang · Chen Wei · Yingqing He · Renjie Pi · Qifeng Chen
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao · Liangliang Nan · Hugo Ledoux
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Xingguang Zhang · Nicholas M Chimitt · Xijun Wang · Yu Yuan · Stanley H. Chan
MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation
Jae-Ho Choi · Soheil Hor · Shubo Yang · Amin Arbabian
Articulated Motion Distillation from Video Diffusion Models
Xuan Li · Qianli Ma · Tsung-Yi Lin · Yongxin Chen · Chenfanfu Jiang · Ming-Yu Liu · Donglai Xiang
Feature Spectrum Learning for Remote Sensing Change Detection
Qi Zang · Dong Zhao · Shuang Wang · Dou Quan · Licheng Jiao · Zhun Zhong
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez · Ismail Ben Ayed · Jose Dolz
PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution
Shuangfan Zhou · Chu Zhou · Youwei Lyu · Heng Guo · Zhanyu Ma · Boxin Shi · Imari Sato
PoseBH: Propotypical Multi-Dataset Training Beyond Human Pose Estimation
Uyoung Jeong · Jonathan Freer · S

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值