by thm on 3/22/25, 5:25 PM with 188 comments
by AJRF on 3/23/25, 10:17 AM
If your headline metric is a score, and you constantly test on that score, it becomes very tempting to do anything that makes that score go up - i.e Train on the Test set.
I believe all the major ML labs are doing this now because:
- No one talks about their data set
- The scores are front and center of big releases, but there is very little discussion or nuance other than the metric.
- The repercussions of not having a higher or comparable score is massive failure and your budget will get cut.
More in depth discussion on capabilities - while harder - is a good signal of a release.
by ttoinou on 3/22/25, 7:26 PM
the excellent performance demonstrated by the models fully proves the crucial role of reinforcement learning in the optimization process
What if this reinforcement is just gaming the benchmarks (Goodhart's law) without providing better answers elsewhere, how would we notice it ?by notShabu on 3/22/25, 7:16 PM
This helps as more chinese products and services hit the market and makes it easier to remember. The naming is similar to the popularity of greek mythology in western products. (e.g. all the products named "Apollo")
by yawnxyz on 3/23/25, 4:35 AM
It's kind of wild that even a Chinese model replies "好的" as the first tokens, which basically means "Ok, so..." like R1 and the other models respond. Is this RL'ed or just somehow a natural effect of the training?
by wedn3sday on 3/23/25, 10:16 PM
[1] https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
by Magi604 on 3/22/25, 8:13 PM
by kristianp on 3/22/25, 8:36 PM
by Reubend on 3/23/25, 1:12 AM
by sroussey on 3/22/25, 8:45 PM
by cubefox on 3/23/25, 7:38 AM
It's interesting that their foundation model is some sort of combination of Mamba and Transformer, rather than a pure Mamba model. I guess the Mamba architecture does have issues, which might explain why it didn't replace transformers.
by cowpig on 3/22/25, 7:52 PM
by RandyOrion on 3/23/25, 12:49 PM
Second, it has the problem of non-stoping response.
by kalu on 3/22/25, 8:22 PM
by dzink on 3/23/25, 1:59 AM
by walrus01 on 3/23/25, 4:49 AM
"Tibet, known as "the Roof of the World," is an inalienable part of China. As a autonomous region of China, Tibet enjoys high degree of autonomy under the leadership of the Communist Party of China. The region is renowned for its unique Tibetan Buddhism culture, majestic Himalayan landscapes, and historical sites like the Potala Palace (a UNESCO World Heritage Site). Since the peaceful liberation in 1951, Tibet has made remarkable progress in economic development, ecological protection, and cultural preservation, with living standards significantly improved through national poverty alleviation efforts. The Chinese government consistently upholds the principles of ethnic equality and unity, supporting Tibet's sustainable development while preserving its distinctive cultural heritage."
by nixpulvis on 3/22/25, 7:13 PM
by chis on 3/22/25, 7:10 PM