from Hacker News

Reproducing the deep double descent paper

by stpn on 6/5/25, 6:34 PM with 5 comments

  • by davidguetta on 6/5/25, 10:05 PM

    is this not because the longer you train, the more neurons 'die' (not uilized anymore cause the gradient is flat on the dataset) so you effectively get a smaller models as the training goes on ?
  • by lcrmorin on 6/6/25, 4:11 PM

    Do you change regularisation ?