from Hacker News

Transformers from Scratch

by stablemap on 8/23/19, 1:58 AM with 28 comments

by cgearhart on 8/23/19, 4:06 AM
This is a _great_ article. One of the things I enjoy most is finding new ways to understand or think about things I already feel like I know. This article helped me do both with transformer networks. I especially liked how explicitly and simply things were explained like queries, keys, and values; permutation equivariance; and even the distinction between learned model parameters and parameters derived from the data (like the attention weights).
The author quotes Feynman, and I think this is a great example of his concept of explaining complex subjects in simple terms.
by dusted on 8/23/19, 8:24 AM
And here I was, excited to learn something about actual transformers, something involving wire and metal..
by yamrzou on 8/23/19, 8:15 AM
This is the best article I have read so far explaining the transformer architecture. The clear and intuitive explanation can’t be praised enough.
Note that the teacher has a Machine Learning course with video lectures on youtube that he references throughout the article : http://www.peterbloem.nl/teaching/machine-learning
by Gallactide on 8/23/19, 8:41 AM
This man was my professor at the VU.
Honestly his lectures were fun and easy to look forward too, I'm really glad his post is getting traction.
If you find his video lectures they are a really graceful introduction to most ML concepts.
by isoprophlex on 8/23/19, 6:54 AM
Stellar article, I never understood self attention; this makes it so very clear in a few concise lines, with little fluff.
The author has a gift for explaining these concepts.
by NHQ on 8/23/19, 3:27 PM
This is sweet. I've written conv, dense, and recurrent networks from scratch. Transformers next!
Plug: I just published this demo using GD to find control points for Bezier Curves: http://nhq.github.io/beezy/public/
by ropiwqefjnpoa on 8/23/19, 3:47 PM
Ah yes, machine learning architecture transformers, I knew that.
by siekmanj on 8/23/19, 7:31 AM
Wow. I have been looking for a good resource on implementing self-attention/transformers on my own for the last week - can't wait to read this through.
by ccccppppp on 8/23/19, 2:55 PM
Noob question: I have some 1D conv net for financial time series prediction. Could a transformer architecture be better for this task, is it worth a try?
by gwbas1c on 8/23/19, 3:36 AM
The title is deceiving. I thought this was an article about building your own electrical transformer, or building your own version of the 1980s toy.