from Hacker News

Tencent Hunyuan-Large

by helloericsf on 11/5/24, 6:52 PM with 103 comments

  • by mrob on 11/5/24, 8:25 PM

    Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.

    EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".

  • by a_wild_dandan on 11/5/24, 8:45 PM

    The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.

    We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.

  • by 1R053 on 11/5/24, 8:31 PM

    the paper with details: https://arxiv.org/pdf/2411.02265

    They use

    - 16 experts, of which one is activated per token

    - 1 shared expert that is always active

    in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.

  • by the_duke on 11/5/24, 8:33 PM

    > Territory” shall mean the worldwide territory, excluding the territory of the European Union.

    Anyone have some background on this?

  • by helloericsf on 11/5/24, 6:52 PM

    - 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. - outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.
  • by eptcyka on 11/5/24, 8:21 PM

    Definitely not trained on Nvidia or AMD GPUs.
  • by Tepix on 11/5/24, 10:50 PM

    I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?
  • by adt on 11/5/24, 9:51 PM

  • by iqandjoke on 11/6/24, 2:45 AM

    How does it compare with LLama3.2?