from Hacker News

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

by KhoomeiK on 3/10/24, 12:40 PM with 28 comments

by kayson on 3/11/24, 1:21 AM
I want to make a Discord bot that impersonates all my friends and continues to refine the model as the conversations continue. Basically this [1] post, but with a more modern model and, ideally, reinforcement learning. Seems like this would fit the bill.... Is there anything else that would make this easier?
[1] https://www.izzy.co/blogs/robo-boys.html
by katzenversteher on 3/11/24, 9:23 AM
From the title I misunderstood what it does. However, now I'm wondering if what I thought is was (don't ask my why I thought it) is possible:
I have a PC that is able to run e.g. Mistral Instruct 7B Q4 inference with around 30 token/s.
How (computation and memory) expensive would it be to also run backpropagation in addition to inference?
I'm aware that the models are typically fed with much more and better data than what is typically provided during normal conversations but on the other hand if I could finetune my local model a teeny tiny bit during during / after each conversation I have with it anyways, it would after a while be perfectly customize for me.
I'm also aware that this could be problematic for models that are used by multiple users but my intended use case would be personal use by a single user.
by internet101010 on 3/10/24, 5:45 PM
Thank you for making this. Simplifying any aspect of RL is always welcome.
by potatoman22 on 3/11/24, 7:38 AM
Could someone help me understand the kinds of things you can build with this? Is this like RLHF?
by dennisy on 3/10/24, 7:29 PM
Can this be used outside of OpenAI environments? If yes I think an example would be great!
by KhoomeiK on 3/10/24, 5:32 PM
Twitter thread: https://x.com/khoomeik/status/1766805213644800011?s=46
by adawg4 on 3/10/24, 10:54 PM
Thanks for making this! Helps simplify it nicely
by zeroq on 3/10/24, 11:24 PM
When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?
by 3abiton on 3/10/24, 3:18 PM
Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.
by raidicy on 3/10/24, 3:19 PM
Thanks for creating this!
by ponderchan on 3/21/24, 8:07 AM
llamagym.com for sale
by neodypsis on 3/10/24, 7:51 PM
Very interesting!
by SuhanaJabin on 3/11/24, 3:32 PM
Simplified the concept. Nicely done!