by asdfasdf1 on 3/13/24, 4:13 PM with 86 comments
by cs702 on 3/13/24, 6:10 PM
Each parameter is a connection between artificial neurons. For example, inside an AI model, a linear layer that transforms an input vector with 1024 elements to an output vector with 2048 elements has 1024×2048 = ~2M parameters in a weight matrix. Each parameter specifies by how much each element in the input vector contributes to or subtracts from each element in the output vector. Each output vector element is a weighted sum (AKA a linear combination), of each input vector element.
A human brain has an estimated 100-500 trillion synapses connecting biological neurons. Each synapse is quite a complicated biological structure[a], but if we oversimplify things and assume that every synapse can be modeled as a single parameter in a weight matrix, then the largest AI models in use today have approximately 100T to 500T ÷ 0.5T = 200x to 1000x fewer connections between neurons that the human brain. If the company's claims prove true, this new chip will enable training of AI models that have only 4x to 20x fewer connections that the human brain.
We sure live in interesting times!
---
by brucethemoose2 on 3/13/24, 4:49 PM
https://web.archive.org/web/20230812020202/https://www.youtu...
(Vimeo/Archive because the original video was taken down from YouTube)
by fxj on 3/13/24, 5:25 PM
https://www.cerebras.net/blog/whats-new-in-r0.6-of-the-cereb...
"CSL allows for compile time execution of code blocks that take compile-time constant objects as input, a powerful feature it inherits from Zig, on which CSL is based. CSL will be largely familiar to anyone who is comfortable with C/C++, but there are some new capabilities on top of the C-derived basics."
by RetroTechie on 3/13/24, 6:31 PM
How many early supercomputers / workstations etc would that include? How much progress did humanity make using all those early machines (or any transistorized device!) combined?
by ortusdux on 3/13/24, 4:38 PM
by imbusy111 on 3/13/24, 4:50 PM
by asdfasdf1 on 3/13/24, 4:13 PM
by Rexxar on 3/13/24, 6:10 PM
by modeless on 3/13/24, 10:05 PM
In the days before LLMs 44 GB of SRAM sounded like a lot, but these days it's practically nothing. It's possible that novel architectures could be built for Cerebras that leverage the unique capabilities, but the inaccessibility of the hardware is a problem. So few people will ever get to play with one that it's unlikely new architectures will be developed for it.
by imtringued on 3/13/24, 8:35 PM
by marmaduke on 3/13/24, 9:57 PM
by tivert on 3/13/24, 4:56 PM
Maybe it wouldn't be as powerful as one of these, due to their less capable fabs, but something that's good enough to get the job done in spite of the embargoes.
by asdfasdf1 on 3/13/24, 5:24 PM
https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20B...
by asdfasdf1 on 3/13/24, 5:00 PM
- non-sparse fp16 in WSE-2 was 7.5 tflops (about 8 H100s, 10x worse performance per dollar)
Does anyone know the WSE-3 numbers? Datasheet seems lacking loads of details
Also, 2.5 million USD for 1 x WSE-3, why just 44GB tho???
by holoduke on 3/13/24, 6:37 PM
by TradingPlaces on 3/13/24, 10:05 PM
by api on 3/13/24, 9:49 PM
by beautifulfreak on 3/13/24, 9:30 PM
by tedivm on 3/13/24, 9:35 PM
* Power Usage
* Rack Size (last one I played with was 17u)
* Cooling requirements
by tibbydudeza on 3/13/24, 10:04 PM
by pgraf on 3/13/24, 9:29 PM
by hashtag-til on 3/13/24, 9:28 PM
by wizardforhire on 3/13/24, 5:56 PM
by AdamH12113 on 3/13/24, 4:40 PM