by HardikVala on 1/26/25, 5:20 AM with 56 comments
I'm not interested in hands-on guides (eg. how to train a DNN classifier in TensorFlow) or LLM-centric resources.
So far, I've put together the following curriculum:
1 Artificial Intelligence: A Modern Approach (https://aima.cs.berkeley.edu/) - Great for learning the breadth of foundational concepts, eg. local search algorithms, building up to modern AI.
2 Probabilistic Machine Learning: An Introduction (https://probml.github.io/pml-book/book1.html) - Going more in-depth into ML.
3 Dive into Deep Learning (https://d2l.ai/) - Going deep into DL, including contemporary ideas like Transformers and Diffusion models.
4. Neural networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) could also be a great resource but the content probably overlaps significantly with 3.
Would anybody add/update/remove anything? (Don't have to limit recommendations to textbooks. Also open to courses, papers, etc.)
Sorry for the semi-redundant post.
by noduerme on 1/26/25, 7:15 AM
Without reading about how it's done now, just think about how you think a neural network should function. It ostensibly has input, output, and something in the middle. Maybe its input is a 64x64 pixel handwritten character, and its output is a unicode number. In between the input pixels (a 64x64 array) and the output, are a bunch of neurons. Layers of neurons. That talk to each other and learn or un-learn (are rewarded or punished).
Build that. Build a cube where one side is a pixel grid and the other side delivers a number. Decide how the neurons influence each other and how they train their weights to deliver the result at the other end. However you think it should go. Just raw code it with arrays in whatever dimensions you want and make it work; you can do it in Javascript or BASIC. link them however you want. Don't worry about performance, because you can assume that whatever marginally works can be tested on a massive scale and show "impressive" results.
by InkCanon on 1/26/25, 6:14 AM
The other big question is why you want to learn it. If you want to learn ML in itself, than anything including the search algorithms (which used to be considered core to ML a long time ago) you mentioned is part of that. But if you want to learn ML to contribute to modern developments like LLMs, then search algorithms are virtually useless. If you aren't going to be engineering any ML or ML products, what you want is to gain some insight into it's future and the business of it. So learning things like transformer architecture is going to be far more unhelpful than say, reading about the economics of compute clusters.
Given the empirical/engineering quality of current ML, I'd say building it from scratch is really good for getting the handful of possible first principles (the fundamental functions involved, data cleaning, training, etc)
by CamperBob2 on 1/27/25, 1:00 AM
If you want a historical perspective, which is very worthwhile, start by reading about the mid-century work of McCullough and Pitts, and Minsky, Papert and their colleagues at MIT CSAIL after that.
There will be a dry spell after Minsky and Papert because of their conclusion that the OG neural-network topology that everyone was familiar with, the so-called "perceptron", was a dead end. That conclusion was premature to say the least, but in any event the hardware and training techniques weren't available to support any serious progress.
Adding hidden layers and nonlinear activation functions to the perceptron network seemed promising, in that they worked around some of Minsky's technical objections. The multi-layer perceptron was now a "universal approximator" capable of modeling any linear or nonlinear function. In retrospect that should have been considered a bigger deal than it was, but the MLP was still a PIA to train, and it didn't seem very useful at the scales achievable in hardware at the time. Anything a neural net could do, specialized code could usually do better and cheaper.
Then, in the circa-2010 timeframe, AlexNet dusted off some of the older ideas and used them to win image-recognition benchmark competitions, not by a small margin but by blowing everybody else into the weeds. That brought the multi-layer perceptron back into vogue, and almost everything that has happened since can be traced back to that work.
The Karpathy videos are the best intro to the MLP concept I've run across. Understanding the MLP is the key prereq if you want to understand current-gen AI from first principles.
by grepLeigh on 1/27/25, 4:14 PM
There's also a world of statistics and machine learning outside of deep learning. I think the best way to get started on that end is an undergrad survey course like CS189: https://people.eecs.berkeley.edu/~jrs/189/
by jmholla on 1/27/25, 6:50 PM
by andyjohnson0 on 1/27/25, 9:45 AM
Looks like the course has turned into a multi-course "specialization" and I have no idea if any of it is the aame as the course I did. But it might be a place to start.
by riwsky on 1/27/25, 2:04 AM
by Maro on 1/27/25, 8:06 AM
If you don't have a solid background in math, then that's what you should improve upon (calculus, linear algebra, discrete math, probability theory, information theory). Some of the books you mention do cover this at the beginning, but most people take separate courses on these topics at University, with lots of homework, etc.
Also, the first book on your list is the classic textbook by Norvig, but I don't think it's actually very good. I remember reading it in my college AI course 25 years ago and it was painful back then (anybody remember "wumpus"?). It's a big book that covers too much, it's like printing out a lot of Wikipedia pages. You're better off finding books with smaller scope that focus on something you actually care about / is relevant to the way the field has developed.
by ipnon on 1/27/25, 2:09 AM
I prefer the a16z AI canon for this purpose. It’s useful and historical. It’s structured to begin with no prerequisites and work up to cutting edge research papers. And best of all it’s free and open source.
by talles on 1/27/25, 12:38 PM
1. Linear algebra. Be comfortable with vector transformations in the vector space. This is the framework to understand how data is represented and what is going on inside the model.
2. Calculus. Specifically derivatives, up to partial derivatives and the chain rule. This is needed to later understand backpropagation, the learning. It's fine to skip integrals.
3. Vanilla neural network. Study how a simple feed forward and fully connected neural network works, in detail. Every single bit about it.
I wouldn't worry or plan anything ahead until having those. After number 3 you'll have different branches to follow and will be better equipped to pick a path.
by __alexander on 1/26/25, 11:00 PM
How AI Works - https://nostarch.com/how-ai-works
+
Why Machines Learn: The Elegant Math Behind Modern AI - https://www.penguinrandomhouse.com/books/677608/why-machines...
by cdicelico on 1/27/25, 1:36 PM
by Bjartr on 1/27/25, 1:34 PM
by gamblor956 on 1/27/25, 7:37 PM
Step 2: The steps above are a good plan for learning about traditional AI, and the traditional approaches, which were based on an attempt to model human thought processes. Machine learning was what the industry turned to in the early 2000s because we didn't have the hardware capabilities then to meaningfully model neural networks. We do now, but machine learning has taken over so there's very little research into modeling neural networks...about the same as there was when I was an undergrad.
by 3abiton on 1/26/25, 11:59 PM
by nosioptar on 2/2/25, 6:41 PM
by meltyness on 1/28/25, 3:13 AM
by null_investor on 1/30/25, 10:05 AM
It will show you the maths, you'll build simple neural nets (from maths!) that can read digits and scale it from there.
While doing it, you may struggle with some of the maths, just take a deep breath and invest sometime to fill the gaps you need and continue.
Learn about CNNs and everything else step-by-step. It's awesome.
by exe34 on 1/27/25, 7:29 AM
by syxp on 1/26/25, 11:08 PM
It covers ground I don't know where else to find on getting from theory to practice.
by virginwidow on 2/5/25, 3:34 AM
When a notion is disturbing before dismissal I must know why (MY why) 90% boils down to fear (mine) per lacking information hence failure to understand.
The discussion & refences found aided much for my understanding
by Mr-Frog on 1/26/25, 5:40 AM
by wodenokoto on 1/27/25, 6:26 AM
I think that is "first principals of AI". Like, what does it even mean when we ask an algorithm to "learn" from data?
by romperstomper on 1/29/25, 8:45 AM
by hnarayanan on 1/26/25, 10:20 PM
by animesh on 1/27/25, 8:25 AM
by charlieyu1 on 1/27/25, 3:44 AM
by rcarr on 1/27/25, 5:13 PM
by crimsoneer on 1/27/25, 4:07 PM
by markus_zhang on 1/27/25, 2:37 PM
by swah on 1/28/25, 11:34 AM