from Hacker News

InternLM2

by milliondreams on 3/31/24, 11:51 PM with 24 comments

  • by zone411 on 4/1/24, 4:00 AM

    We really need better long context benchmarks than needle-in-a-haystack. There is LV-Eval (https://arxiv.org/abs/2402.05136) with multi-hop QA that's better but still pretty basic.
  • by esha_manideep on 4/1/24, 3:12 AM

    Pretty amazing to see training data being discussed more openly
  • by milliondreams on 3/31/24, 11:51 PM

    TLDR; 1. InternLM2 is an open-source Large Language Model that has shown improvements over previous models, particularly in long-context modeling. 2. The model uses a unique approach, combining traditional training with Supervised Fine-Tuning and Conditional Online Reinforcement Learning from Human Feedback. 3. It offers a variety of model sizes and training stages to the community, demonstrating significant advancements in AI research and application.
  • by barsonme on 4/1/24, 4:39 AM

    Is it normal for papers to have that many authors?
  • by ilaksh on 4/1/24, 2:37 AM

    Does anyone know how the free commercial license works? Do they usually grant it? https://wj.qq.com/s2/12727483/5dba/ looks like a form there.

    Apache 2 code, free commercial license with application form for weights.

  • by Kwpolska on 4/1/24, 4:01 PM

    The name suggests this is interns posing as a chatbot, especially considering today’s date.
  • by pilotneko on 4/1/24, 1:17 PM

    I experimented with this model and vLLM around a month ago. The long context length is attractive, but it was incredibly slow on a g5.12xlarge (4 NVIDIA A10G GPUs). I actually could not get it to respond for single examples longer than 50K tokens.
  • by viraptor on 4/1/24, 3:32 AM

  • by dannyw on 4/1/24, 3:31 AM

    How good is the base (non-instruction-tuned) model? Everyone is trying to make chat bots, but for my use cases, I find base models more suitable.