by sumo43 on 5/1/24, 3:30 AM with 142 comments
by GistNoesis on 5/1/24, 12:25 PM
https://github.com/GistNoesis/FourierKAN/
The core is really just a few lines.
In the paper they use some spline interpolation to represent 1d function that they sum. Their code seemed aimed at smaller sizes. Instead I chose a different representation, aka fourier coefficients that are used to interpolate the functions of individual coordinates.
It should give an idea of Kolmogorov-Arnold networks representation power, it should probably converge easier than their spline version but spline version have less operations.
Of course, if my code doesn't work, it doesn't mean theirs doesn't.
Feel free to experiment and publish paper if you want.
by krasin on 5/1/24, 6:40 AM
It works as advertised with the parameters selected by the authors, but if we modified the network shape in the second half of the tutorial (Classification formulation) from (2, 2) to (2, 2, 2), it fails to generalize. The training loss gets down to 1e-9, while test loss stays around 3e-1. Getting to larger network sizes does not help either.
I would really like to see a bigger example with many more parameters and more data complexity and if it could be trained at all. MNIST would be a good start.
Update: I increased the training dataset size 100x, and that helps with the overfitting, but now I can't get training loss below 1e-2. Still iterating on it; a GPU acceleration would really help - right now, my progress is limited by the speed of my CPU.
1. https://github.com/KindXiaoming/pykan/blob/master/tutorials/...
by esafak on 5/2/24, 1:09 AM
GLMs in turn generalize logistic-, linear and other popular regression models.
Neural GAMs with learned basis functions have already been proposed, so I'm a bit surprised that the prior art is not mentioned in this new paper. Previous applications focused more on interpretability.
by montebicyclelo on 5/1/24, 8:04 AM
It's not clear from the paper how well this algorithm will scale, both in terms of the algorithm itself (does it still train well with more layers?), and ability to make use of hardware acceleration, (e.g. it's not clear to me that the structure, with its per-weight activation functions, can make use of fast matmul acceleration).
It's an interesting idea, that seems to work well and have nice properties on a smaller scale; but whether it's a good architecture for imagenet, LLMs, etc. is not clear at this stage.
by cs702 on 5/1/24, 1:58 PM
The best thing about this new work is that it's not an either/or proposition. The proposed "learnable spline interpolations as activation functions" can be used in conventional DNNs, to improve their expressivity. Now we just have to test the stuff to see if it really works better.
Very nice. Thank you for sharing this work here!
---
by mxwsn on 5/1/24, 6:32 AM
by ubj on 5/1/24, 1:09 PM
by reynoldss on 5/1/24, 1:20 PM
by Lichtso on 5/1/24, 6:21 PM
1957: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr...
1958: https://en.wikipedia.org/wiki/Multilayer_perceptron
2. Another advantage of this approach is that it has only one class of parameters (the coefficients of the local activation functions) as opposed to MLP which has three classes of parameters (weights, biases, and the globally uniform activation function).
3. Everybody is talking transformers. I want to see diffusion models with this approach.
by cbsmith on 5/1/24, 5:42 AM
by adityang5 on 5/4/24, 4:49 AM
- PyTorch Module of the KAN GPT
- Deployed to PyPi
- MIT Licence
- Test Cases to ensure forward-backward passes work as expected
- Training script
I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large?
I'm open to contributions and would love to hear your thoughts!
by cloudhan on 5/2/24, 12:59 AM
by apolar on 5/5/24, 4:38 PM
Seminar 2021: https://warwick.ac.uk/fac/sci/maths/research/events/seminars...
Article in archive 2023: https://arxiv.org/abs/2305.08194
Video 2021: https://www.youtube.com/watch?v=eS_k6L638k0
Extension to stochastic models where KAN builds the distribution 2023: https://www.youtube.com/watch?v=0hhJIpzxPR0
by yobbo on 5/1/24, 9:41 AM
At the end of this example, they recover the symbolic formula that generated their training set: exp(x₂² + sin(3.14x₁)).
It's like a computation graph with a library of "activation functions" that is optimised, and then pruned. You can recover good symbolic formulas from the pruned graph.
Maybe not meaningful for MNIST.
by diwank on 5/1/24, 4:53 AM
by SpaceManNabs on 5/1/24, 3:14 PM
by phpkar on 5/5/24, 2:21 PM
by ALittleLight on 5/1/24, 5:48 AM
by erwincoumans on 5/2/24, 6:42 PM
by mipt98 on 5/3/24, 6:07 PM
by kevmo314 on 5/1/24, 8:33 PM
by syassami on 5/3/24, 4:04 PM
by Maro on 5/1/24, 4:37 AM
Would this approach (with non-linear learning) still be able to utilize GPUs to speed up training?
by coderenegade on 5/2/24, 2:28 AM
by renonce on 5/5/24, 1:27 PM
I mean it's great but at the current state it seems better suited for tasks where an explicit formula exists (though not known) and the goal is to predict it on unknown points (and be able to interpret the formula as a side effect). Deep learning tasks are more of a statistical nature (think models with a cross entropy loss - it's statistically predicting the frequency of different choices of the class/next token), it requires a specialized training procedure and it is designed to fit 100% rather than somewhat close (think linear algebra - it won't be good at it). It would very likely take a radically different idea to apply it to deep learning tasks. The recently updated "Author's note" also mentions this: "KANs are designed for applications where one cares about high accuracy and/or interpretability."
It's great but let's be patient before we see this improve LLM accuracy or be used elsewhere.
by nico on 5/1/24, 4:36 AM
I wonder how many more new architectures are going to be found in the next few years
by ComplexSystems on 5/1/24, 7:26 PM
by nu91 on 5/2/24, 5:59 AM
by brrrrrm on 5/1/24, 7:21 PM
by arianvanp on 5/1/24, 8:03 AM
by keynesyoudigit on 5/1/24, 2:51 PM
by yza on 5/2/24, 12:21 PM
by WithinReason on 5/1/24, 8:09 AM