preprint
Oct 9, 2025Jiang, Eric Hanchen, Ou, Weixuan, Liu, Run, Pang, Shengyuan, Wan, Guancheng, Duan, Ranjie, Dong, Wei, Chang, Kai-Wei, Wang, XiaoFeng, Wu, Ying Nian, Li, Xinfeng
Safety alignment of large language models (LLMs) faces a key challenge: current alignment techniques often only focus on improving safety against harmful prompts, causing LLMs to b…
Allocate tokens
Max 3 per paper. Tokens are saved when you submit.