from Hacker News

GLM-4-9B: open-source model with superior performance to Llama-3-8B

by marcelsalathe on 6/5/24, 6:26 PM with 17 comments

by ilaksh on 6/6/24, 1:14 AM
Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.
by great_psy on 6/5/24, 11:55 PM
I’m excited to hear work is being done on models that support function calling natively.
Does anybody know if performance could be greatly increased if only a single language was supported ?
I suspect there’s a high demand for models that are maybe smaller and can run faster if the tradeoff is support for only English.
Is this available in ollama ?
by abrichr on 6/6/24, 4:38 PM
> GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.
by norwalkbear on 6/6/24, 12:37 PM
Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?
Llama-3-8b was garbage for me but damn 70b is good enough
by oarth on 6/6/24, 7:17 AM
If those numbers are true then it's very impressive. Hoping for llama.cpp support.
by nubinetwork on 6/6/24, 2:47 PM
1M context, but does it really? I've been hit with 32K models that crap out after 10K before...
by fragmede on 6/6/24, 6:23 AM
model available, not open source.
by refulgentis on 6/6/24, 10:31 PM
Ehhh man this is frustrating, 7B was a real sweet spot for hobbyist. 8B...doable. I've been joking to myself/simultaneously worried that Llama 3 8B and Phi-3 "3B" (3.8B) would start a "ehhh, +1, might as well be a rounding error" thing. It's a big deal! I measure a 33% decrease just going from 3B to 3.8B when inferencing on CPU.