from Hacker News

Reproducing the deep double descent paper

by stpn on 6/5/25, 6:34 PM with 5 comments

by davidguetta on 6/5/25, 10:05 PM
is this not because the longer you train, the more neurons 'die' (not uilized anymore cause the gradient is flat on the dataset) so you effectively get a smaller models as the training goes on ?
by lcrmorin on 6/6/25, 4:11 PM
Do you change regularisation ?