from Hacker News

Nvidia open source LLM Nemotron 4 340B at top of the charts [pdf]

by moondistance on 6/15/24, 9:56 AM with 1 comments

  • by bigyikes on 6/15/24, 1:52 PM

    >We use synthetic data heavily to create Nemotron-4-340B-Instruct: over 98% of our training data has been synthetically generated throughout our alignment process.

    Very interesting to see synthetic data used so heavily during alignment.

    Are there any known models that make heavy use of synthetic data during pretraining?