When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
Paper
•
2602.08235
•
Published
Natural language processing, language models, language agents
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents