from Hacker News

Knowing Enough About MoE to Explain Dropped Tokens in GPT-4

by 152334H on 8/8/23, 10:25 PM with 1 comments

by turtleyacht on 8/8/23, 10:27 PM
In AI/ML, Mixture of Experts (MoE).
"GPT-4 uses a simple top-2 Token Choice router for MLP MoE layers. It does not use MoE for attention."
GPT won't fix, since "tokens being dropped are generally good for the performance of MoE models."
https://152334h.github.io/blog/knowing-enough-about-moe/#con...