by summarity on 5/25/25, 4:50 PM with 1 comments
by Genego on 5/28/25, 4:48 AM
So if I saw something went wrong, I would say: "Next time don't do that, please do this instead" - Architect agent then reviews the entire tool and agent call chain, and makes a new adaptation to each agent (if necessary).
I was calling this "Poor man's RLHF" - it has been quite fun to interact with. Ended up making it so that this is a JSON file that I could later (potentially use for finetuning). But I was always wondering if there was a name for this? Is it the similar as DPO? I called it "behavioral adaptation". For a small system it was quite effective. But I also didn't bother to research it.