from Hacker News

Kai-Fu Lee's 01.ai Releases 9B, Long Context 34B Models

by brucethemoose2 on 3/7/24, 4:34 PM with 2 comments

  • by brucethemoose2 on 3/7/24, 4:34 PM

    Yi 9B has been released with quite impressive looking benchmarks. While probably not a SOTA coding model, it looks like a strong competitor to leading edge "small" models Mistral 0.2 and Solar.

    Meanwhile, Yi-34B-200K has been updated:

    > In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pretrain the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance.

    I am particularly excited for this, as Yi 34B 200K finetunes are already my favorite non coding models, even including 70Bs. And its just about perfect for a 24GB consumer card (which I can cram about 75K onto without too much compromise).

  • by gardenfelder on 3/7/24, 9:08 PM

    "yi" is pronounced as a long E. Famously, King Yi and his wife are associated with the Chinese Moon Festival.