by nijfranck on 6/9/23, 3:02 PM with 36 comments
by HarHarVeryFunny on 6/9/23, 3:48 PM
As people start to use these API's in production, there needs to be stricter version control, especially given how complex (impossible unless you are only using a fixed set of prompts) it is for anyone to test for backwards compatibility. Maybe something like Ubuntu's stable long-term releases vs bleeding edge ones would work. Have some models that are guaranteed not to change for a specified amount of time, and others that will be periodically updated for people who want cutting edge behavior and care less about backwards compatibility.
by beepbooptheory on 6/9/23, 6:40 PM
by nijfranck on 6/9/23, 3:02 PM
by brucethemoose2 on 6/9/23, 6:07 PM
This sounds like making diffusion backwards compatible with ESRGAN. Technically they are both upscaling denoisers (with finetunes for specific tasks), and you can set up objective tests compatible with both, but actual way they are used is so different that its not even a good performance measurement.
The same thing applies to recent LLMs, and the structural changes are only going to get more drastic and fundamental. For instance, what about LLMs with seperate instruction and data context? Or multimodal LLMs with multiple inputs/outputs? Or LLMs that finetune themselves during inference? That is just scratching the surface.
by netruk44 on 6/9/23, 4:09 PM
It's mentioned earlier in the article, but I'd like to emphasize that if you go down this route that you should either do multiple evaluations per prompt and come up with some kind of averaged result, or set the temperature to 0.
FTA:
> LLMs are stochastic – there’s no guarantee that an LLM will give you the same output for the same input every time.
> You can force an LLM to give the same response by setting temperature = 0, which is, in general, a good practice.
by ITB on 6/9/23, 4:40 PM
by lachlan_gray on 6/9/23, 4:07 PM
I’m expecting there will be more examples soon, but you can check out my tree of thoughts implementation below to see what I mean
by aldousd666 on 6/9/23, 5:35 PM