from Hacker News

DeepSpeed Chat: Easy, fast and affordable RLHF training of ChatGPT-like models

by quantisan on 4/12/23, 9:48 PM with 55 comments

by tinco on 4/12/23, 10:52 PM
Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
Not that reproducing GPT-4 is going to be easy with this, but it'll definitely get rid of some major hurdles. I read a report about the difficulties HuggingFace had with producing their Bloom model, and a lot of it was the sort of straight forward systems engineering that goes into tooling like this.
Is the Bloom model considered a failure by the community? If you read the introduction it was supposed to include improvements over GPT3, but it performs much worse, I guess because of lower quality training data? I wonder what sort of company would have high enough quality data that they could use this project to fine tune a public model to the point where it would be better in some scenario than plain old GPT4 would be. Especially when you can just inject extra info in to the GPT4 prompt, like phind does for example. What even is the use of fine tuning given GPT 4 exists?
by summarity on 4/12/23, 10:21 PM
Also see the example repo README: https://github.com/microsoft/DeepSpeedExamples/tree/master/a...
> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)
by brofallon on 4/12/23, 10:35 PM
To use RLHF you need a dataset that includes instructions with good & bad answers - do many of those exist? I know there are a few datasets of just plain instructions-with-responses, but I'm not aware of any that have both good and bad (or ranked) responses. Is that trivial, or an important missing element here?
by lxe on 4/12/23, 11:02 PM
It's a little funny how Microsoft DeepSpeed doesn't fully work on Windows
by burtonator on 4/13/23, 2:50 AM
I've gotten so used to ChatGPT I just copied the text of this and told it to summarize the entire thing down to 5 paragraphs.
I know there was a summary but the point is that ChatGPT just really accelerates a LOT of bulk work we were used to having to do manually.
It's an amazing time to be alive!
by teruakohatu on 4/12/23, 11:00 PM
Does the RLHF help with training a LLM model to produce better (more accurate) results for a particular problem domain (eg. Customer support for a particular company) or is it only helpful in training the LLM to be a chat agent in general or a chat agent with guard rails?
by sebzim4500 on 4/13/23, 12:15 PM
What's the difference between the critic model and the reward model? In the diagram they show both.
EDIT: Is the idea that the critic model learns via the PPO process and gives a value estimate to prefixes of the responses?
by hahnchen on 4/13/23, 5:40 AM
microsoft being more open than openai haha
by scottydog51834 on 4/12/23, 11:48 PM
This is a really cool step but as someone without the suggested GPU, it isn't easy or one click for me yet.
I am hoping that someone makes a very simple Jupyter notebook where I can enter my RLHF file and select a few other settings and just run (on AWS or Azure; willing to pay per fine-tuned model say $100-$500 for cloud credits + notebook access).