by lauradhamilton on 7/18/14, 5:58 PM with 15 comments
by gamegoblin on 7/18/14, 7:33 PM
If your layer size is relatively small (not hundreds or thousands of nodes), dropout is usually detrimental and a more traditional regularization method such as weight-decay is superior.
For the size networks Hinton et al are playing with nowadays (with thousands of nodes in a layer), dropout is good, though.
by vundervul on 7/18/14, 10:00 PM
by agibsonccc on 7/18/14, 7:09 PM
https://news.ycombinator.com/item?id=7803101
I will also add that looking in to hessian free for training over conjugate gradient/LBFGS/SGD for feed forward nets has proven to be amazing[1].
Recursive nets I'm still playing with yet, but based on the work by socher, they used LBFGS just fine.
[1]: http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf
[2]: http://socher.org/
by prajit on 7/18/14, 7:41 PM
by TrainedMonkey on 7/18/14, 6:47 PM
by ivan_ah on 7/18/14, 6:38 PM