AAAI2025 Accepted Papers(二)

Paper IDTitleAuthor(s)
5JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMsHongyi Li, Jiawei Ye, Wu Jie, Tianjie Yan, 王楚, Zhixin Li
8Verification of Neural Networks against Convolutional Perturbations via Parameterised KernelsBenedikt Brückner, Alessio Lomuscio
9Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality DistributionCarlos Eiras-Franco, Anna Hedström, Marina MC Höhne
15SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model SafetyPaul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy
18Is poisoning a real threat to DPO? Maybe more so than you thinkPankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang
42IBAS:Imperceptible Backdoor Attacks in Split Learning with Limited Informationpeng xi, Shaoliang Peng, Wenjuan Tang
51On the Consideration of AI Openness: Can Good Intent Be Abused?Yeeun Kim, Hyunseo Shin, Eunkyung Choi, Hongseok Oh, Hyunjun Kim, Wonseok Hwang
58Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential DiscountingJoar Max Viktor Skalse, Alessandro Abate
61Risk Controlled Image RetrievalKaiwen Cai, Chris Xiaoxuan Lu, Xingyu Zhao, Wei Huang, Xiaowei Huang
66Do Transformer Interpretability Methods Transfer to RNNs?Gonçalo Santos Paulo, Thomas Marshall, Nora Belrose
81Aligning Large Language Models for Faithful Integrity against Opposing ArgumentYong Zhao, Yang Deng, See-Kiong Ng, TatSeng Chua
87Single Character Perturbations Break LLM AlignmentLeon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh
96ME: Modelling Ethical Values for Value AlignmentEryn Rigley, Adriane Chapman, Christine Evers, Will McNeill
100Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial PromptingYun-Da Tsai, Ting-Yu Yen, Keng-Te Liao, Shou-De Lin
108ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat TemplatesFengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran
109UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion ModelsZihan Guan, Mengxuan Hu, Sheng Li, Anil Kumar Vullikanti
113SMLE: Safe Machine Learning via Embedded OverapproximationMatteo Francobaldi, Michele Lombardi
119Increased Compute Efficiency and the Diffusion of AI CapabilitiesKonstantin Friedemann Pilz, Lennart Heim, Nicholas Brown
133Searching for Unfairness in Algorithms’ Outputs: Novel Tests and InsightsIan Davidson, S. S. Ravi
140Retention Score: Quantifying Jailbreak Risks for Vision Language ModelsZAITANG LI, Pin-Yu Chen, Tsung-Yi Ho
144Scaling Laws for Data Poisoning in LLMsDillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine
166Data with High and Consistent Preference Difference Are Better for Reward ModelQi Lin, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Huixing Jiang, Chen Wei
168Neurons to Words: A Novel Method for Automated Neural Network Interpretability and AlignmentLukas-Santo Puglisi, Fabio Valdés, Jakob Johannes Metzger
171Stream Aligner: Efficient Sentence-Level Alignment via Distribution InductionHantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
173Strong Empowered and Aligned Weak Mastered Annotation for Weak-to-Strong GeneralizationYongqi Li, Xin Miao, Mayi Xu, Tieyun Qian
189Dynamic Algorithm Termination for Branch-and-Bound-based Neural Network VerificationKonstantin Kaulen, Matthias König, Holger Hoos
196Towards a Theory of AI PersonhoodFrancis Rhys Ward
198 MMJ-Bench \textit{MMJ-Bench} MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language ModelsFenghua Weng, Yue Xu, Chengyan Fu, Wenjie Wang
199Sequential Decision Making in Stochastic Games with Incomplete Preferences over Temporal ObjectivesAbhishek Ninad Kulkarni, Jie Fu, ufuk topcu
213CALM: Curiosity-Driven Auditing for Large Language ModelsXiang Zheng, Longxiang WANG, Yi Liu, Xingjun Ma, Chao Shen, Cong Wang
215Bias Unveiled: Investigating Social Bias in LLM-Generated CodeLin Ling, Fazle Rabbi, Song Wang, Jinqiu Yang
221SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language ModelsSomnath Banerjee, Sayan Layek, Soham Tripathy, Shanu Kumar, Animesh Mukherjee, Rima Hazra
222Align-Pro: A Principled Approach to Prompt Optimization for LLM AlignmentPrashant Trivedi, Souradip Chakraborty, Avinash Reddy, Vaneet Aggarwal, Amrit Singh Bedi, George K. Atia
229Maximizing Signal in Human-Model Preference AlignmentMargaret Kroll, Kelsey Kraus
245Robust Multi-Objective Preference Alignment with Online DPORaghav Gupta, Ryan Sullivan, Yunxuan Li, Samrat Phatale, Abhinav Rastogi
246Reinforcement Learning Platform for Adversarial Black-box Attacks with Custom Distortion FiltersSoumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Ricardo Luna Gutierrez, Antonio Guillen, Desik Rengarajan
250DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially PrivatelyHuiwen Wu, Deyi Zhang, Xiaohan Li, Xiaogang Xu, Jiafei Wu, Zhe Liu
260Quantifying Misalignment Between AgentsAidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen
266MAPLE: A Framework for Active Preference Learning Guided by Large Language ModelsSaaduddin Mahmud, Mason Nakamura, Shlomo Zilberstein
267Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection AttacksWenhao Liao, Sineng Yan, Youqian Zhang, Xinwei Zhai, Yuanyuan Wang, Eugene Fu
268Retrieving Versus Understanding Extractive Evidence in Few-Shot LearningKarl Elbakian, Samuel Carton
272Political Bias Prediction Models Focus on Source Cues, Not SemanticsSelin Chun, Daejin Choi, Taekyoung Kwon
280Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference DatasetsDuanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei
281Sequential Preference Optimization: Multi-Dimensional Preference Alignment With Implicit Reward ModelingXingzhou Lou, Junge Zhang, Jian Xie, lifeng Liu, Dong Yan, Kaiqi Huang
282AI Emergency Preparedness: Examining the federal government’s ability to detect and respond to AI-related national security threatsAkash Wasil, Everett Thornton Smith, Corin Katzke, Justin Bullock
294In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via SearchEmir Demirović, Christian Schilling, Anna Lukina
328Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through f f f-divergence MinimizationHaoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang
### 关于AAAI 2025会议的信息 目前官方尚未公布关于第三十九届人工智能国际会议(AAAI 2025)的具体详情[^1]。通常情况下,此类大型学术活动会在前一年底或当年初逐步公开更多细节,包括但不限于举办地点、日期安排以及征稿指南等内容。 对于希望参与该会议的研究人员而言,建议密切关注官方网站上的最新动态。过往的会议页面提供了大量有用的历史资料和参考信息,这有助于理解未来的规划方向。此外,可以通过订阅电子邮件列表或者加入社区论坛的方式获取第一时间的通知更新。 #### 如何准备参加未来的人工智能顶级盛会? 为了更好地参与到即将举行的高水平研讨活动中去,提前做好充分准备是非常必要的: - **持续关注官网发布**:定期访问[AAAI官方网站](https://aaai.org/)查看是否有新的公告发出。 - **提升研究能力**:确保自己具备扎实的专业基础和技术水平,特别是在机器学习、深度学习等领域有所专长[^3]。 - **撰写高质量论文**:根据往届会议的主题范围挑选合适的方向开展原创性的探索工作,并按照严格的学术标准完成写作过程。 - **建立人脉网络**:积极参与国内外的相关领域交流平台,结识同行专家并寻求合作机会。 ```python import requests from bs4 import BeautifulSoup def get_conference_info(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') info = [] for paragraph in soup.find_all('p'): text = paragraph.get_text() if "conference" and "information" in text.lower(): info.append(text.strip()) return "\n".join(info) url = "https://aaai.org/conference/aaai/" print(get_conference_info(url)) ``` 此段Python代码可用于抓取指定网页中的有关会议的文字描述部分作为初步了解之用,请注意实际应用时需遵循网站版权规定及robots协议限制。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值