by quantisan on 4/12/23, 9:48 PM with 55 comments
by tinco on 4/12/23, 10:52 PM
Not that reproducing GPT-4 is going to be easy with this, but it'll definitely get rid of some major hurdles. I read a report about the difficulties HuggingFace had with producing their Bloom model, and a lot of it was the sort of straight forward systems engineering that goes into tooling like this.
Is the Bloom model considered a failure by the community? If you read the introduction it was supposed to include improvements over GPT3, but it performs much worse, I guess because of lower quality training data? I wonder what sort of company would have high enough quality data that they could use this project to fine tune a public model to the point where it would be better in some scenario than plain old GPT4 would be. Especially when you can just inject extra info in to the GPT4 prompt, like phind does for example. What even is the use of fine tuning given GPT 4 exists?
by summarity on 4/12/23, 10:21 PM
> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)
by brofallon on 4/12/23, 10:35 PM
by lxe on 4/12/23, 11:02 PM
by burtonator on 4/13/23, 2:50 AM
I know there was a summary but the point is that ChatGPT just really accelerates a LOT of bulk work we were used to having to do manually.
It's an amazing time to be alive!
by teruakohatu on 4/12/23, 11:00 PM
by sebzim4500 on 4/13/23, 12:15 PM
EDIT: Is the idea that the critic model learns via the PPO process and gives a value estimate to prefixes of the responses?
by hahnchen on 4/13/23, 5:40 AM
by scottydog51834 on 4/12/23, 11:48 PM
I am hoping that someone makes a very simple Jupyter notebook where I can enter my RLHF file and select a few other settings and just run (on AWS or Azure; willing to pay per fine-tuned model say $100-$500 for cloud credits + notebook access).