from Hacker News

Vending-Bench: Testing long-term coherence in agents

by vector_spaces on 4/22/25, 5:56 PM with 0 comments