by neximo64 on 12/19/23, 9:15 AM
If this one is from first principles, I wonder what the others are - since they're all out of the transformers paper and from first principles too. It would be impossible to make a model using a layer of abstraction, without understanding it from first princples