- 论文翻译:arXiv-2023 PromptRobust: Towards Evaluating the Robustness of Large Language Models on
- 论文翻译:arxiv-2023 Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
- 论文翻译:ICML-2024 TrustLLM: Trustworthiness in Large Language Models 第七章内容
- 论文翻译:NeurIPS-2024 Jailbroken: How Does LLM Safety Training
Fail? - 论文翻译:EMNLP-2023 CCF-B Multi-step Jailbreaking Privacy Attacks on
ChatGPT - 论文翻译:ACL-2024 CCF-A radSafe: Detecting Jailbreak Prompts for LLMs
via Safety-Critical
Gradient - 论文翻译:ACL-2024.Zeng Y.CCF-A How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to
大模型提示词安全
于 2024-09-11 10:19:33 首次发布