from Hacker News

GradientFlow: Training ImageNet in 1.5 Minutes on 512 GPUs

by gavinuhma on 2/25/19, 5:40 PM with 44 comments

  • by frankchn on 2/25/19, 9:35 PM

    The authors trained ResNet-50 in 7.3 minutes at 75.3% accuracy.

    As a comparison, a Google TPUv3 pod with 1024 chips got to 75.2% accuracy with ResNet-50 in 1.8 minutes, and 76.2% accuracy in 2.2 minutes with an optimizer change and distributed batch normalization [1].

    [1]: https://arxiv.org/abs/1811.06992

  • by gambler on 2/25/19, 8:07 PM

    512 GPUs on 56 Gbps network? I'd rather see researchers exploring potentially more efficient alternatives to traditional neural nets, like XOR Nets, or different architectures like ngForests or probabilistic and logistic circuits, or maybe listen to Vapnik and invest into general statistical learning efficiency.
  • by nl on 2/26/19, 8:55 AM

    For all those complaining about the cost:

    FastAI trained RestNet-50 to 93% accuracy in 18 minutes for $48[1] using the same code which can be run on your own GPU machine.

    If you want to do it cheaper and faster, you can do the same for in 9 minutes for $12 on Googles (publicaly available) TPUv2s.

    This isn't a monopolization of AI, it is the opposite.

    [1] https://dawn.cs.stanford.edu/benchmark/

  • by bitL on 2/25/19, 10:34 PM

    Oh well, this is the death of democratic AI and an end of independent researchers :-( There goes any hope of a single Titan RTX producing meaningful commercial models.
  • by IshKebab on 2/25/19, 8:37 PM

    I thought we were optimising for cost now?
  • by RhysU on 2/26/19, 1:17 AM

    Why? What deep problems have been solved? How will this make our children better?
  • by thro_awayz_days on 2/25/19, 9:07 PM

    Silicon is inferior to chemical energy. Human is upwards of thousands of orders of magnitude more efficient than today's best GPU's. However, speed != efficiency. Classifying imageNet with a human would take 1000 hours.