from Hacker News

Shade-Arena: Evaluating Sabotage and Monitoring in LLM Agents [pdf]

by JnBrymn on 6/18/25, 11:10 PM with 0 comments