from Hacker News

Groq surpasses 1,200 tokens/sec with Llama 3 8B

by YourCupOTea on 5/30/24, 6:52 PM with 31 comments

  • by LorenDB on 5/30/24, 6:57 PM

    Groq is an insane company. SambaNova (discussed yesterday[0]) is also very promising. However, what I really want to see is local AI accelerator chips a la Tenstorrent Grayskull that can boost local generation to hundreds of tokens per second while being more efficient than GPUs.

    [0]: https://news.ycombinator.com/item?id=40508797

  • by windowshopping on 5/30/24, 7:07 PM

    Is groq related to Twitter's grok or is that just a very unfortunate naming coincidence?
  • by andy_xor_andrew on 5/30/24, 7:14 PM

    When reading Hacker News you develop a signal/noise filter, where lots of headlines make bold claims but you filter them out as embellishment or exaggeration.

    My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.

    But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.

  • by behnamoh on 5/30/24, 7:18 PM

    They're not responsive to my questions on Twitter, so I'm asking here:

        When will Groq support a real API (not experimental beta preview)?
    
        When will Groq support logprobs?!
    
        When will Groq actually tell us what their rate limit is?!
    
    
    Until these aren't answered, many of us can't actually build on Groq.

    Edit: It seems I'm getting downvoted by Groq employees...