by SchwKatze on 3/1/25, 1:09 AM with 8 comments
by fnordpiglet on 3/1/25, 4:50 AM
by xg15 on 3/1/25, 12:37 PM
GPT3 and ChatGPT were sort of the ultimate "...how?!" moments when they came out, and many of the early high/medium-level descriptions of how they worked were rather strengthening the feeling of magic instead of clearing up things. (Models generalizing on tasks? Wat? Input size is a hard limit, but inside that limit, you can just throw whatever at the network and it will magically work? Wat? Word order is important, so let's tell the model about it by using a completely insane-looking positional encoding? Wat? And what the hell does "zero-shot learning" even mean?)
But even here, a lot of the mystery vanishes if you "go down the stack" and take time to understand the individual components and how they work together. (What does a transformer do and how does it process the input sequence? How does the token generation loop work? What are encoders and decoders and what do they have as inputs/outputs? Etc).
I think Huggingface and the ability to just step through the inference of small models in Python also helped tremendously at that.
by kjellsbells on 3/1/25, 2:41 PM
It wasn't of course. Just creativity and synthesis. James Burke's Connections series is an entertaining way to pick the synthesis apart. The one on the machine loom is especially interesting since that is where punched cards came from.
by Supermancho on 3/1/25, 7:12 AM
https://web.archive.org/web/20250301045820/https://pthorpe92...
by andrewstuart on 3/1/25, 5:02 AM
You might surprise yourself.
by hehbot on 3/1/25, 5:25 AM
by gunian on 3/1/25, 4:34 AM