by tbragin on 4/22/25, 4:03 PM with 122 comments
by tmoertel on 4/22/25, 4:39 PM
SELECT *
FROM Population
WHERE weight > 0
ORDER BY -LN(1.0 - RANDOM()) / weight
LIMIT 100 -- Sample size.
Can anyone from ClickHouse verify that the lazy-materialization optimization speeds up queries like this one? (I want to make sure the randomization in the ORDER BY clause doesn't prevent the optimization.)by jurgenkesker on 4/22/25, 5:52 PM
by simonw on 4/22/25, 4:38 PM
"this query sorts all 150 million values in the helpful_votes column (which isn’t part of the table’s sort key) and returns the top 3, in just 70 milliseconds cold (with the OS filesystem cache cleared beforehand) and a processing throughput of 2.15 billion rows/s"
I clearly need to update my mental model of what might be a slow query against modern hardware and software. Looks like that's so fast because in a columnar database it only has to load that 150 million value column. I guess sorting 150 million integers in 70ms shouldn't be surprising.
(Also "Peak memory usage: 3.59 MiB" for that? Nice.)
This is a really great article - very clearly explained, good diagrams, I learned a bunch from it.
by kwillets on 4/23/25, 5:21 AM
https://dspace.mit.edu/bitstream/handle/1721.1/34929/MIT-CSA...
by mmsimanga on 4/22/25, 9:04 PM
by Onavo on 4/22/25, 7:02 PM
https://clickhouse.com/blog/chdb-embedded-clickhouse-rocket-...
by justmarc on 4/22/25, 9:37 PM
by skeptrune on 4/23/25, 2:00 AM
What a nice touch. Technical information and diagrams in this were top notch, but the fact there was also some kind of narrative threaded in really put it over the top for me.
by xiasongh on 4/23/25, 6:29 AM
by vjerancrnjak on 4/22/25, 7:51 PM
I'm pretty sure they did not even bother to properly compress the dataset, with some tweaking, could have probably been much smaller than 30GBs. The speed shows that reading the data is slower than decompressing it.
Reminds me of that Cloudflare article where they had a similar idea about encryption being free (slower to read than to decrypt) and finding a bug, that when fixed, materialized this behavior.
The compute engine (chdb) is a wonder to use.
by simianwords on 4/22/25, 5:41 PM
But credit where it is due, obviously clickhouse is an industry leader.
by ohnoesjmr on 4/22/25, 6:21 PM
by hexo on 4/23/25, 11:46 PM
by apwell23 on 4/23/25, 3:45 AM
by higeorge13 on 4/23/25, 5:14 AM
by meta_ai_x on 4/22/25, 6:55 PM
by jangliss on 4/23/25, 11:00 AM
by dangoodmanUT on 4/22/25, 9:14 PM
The CH contributors are really stellar, from multiple companies (Altinity, Tinybird, Cloudflare, ClickHouse)
by tnolet on 4/23/25, 9:03 AM
But we chose ClickHouse and now we just pump in data with little to no optimization.