from Hacker News

Optimizing ClickHouse: Tactics that worked for us

by podoman on 5/14/24, 2:57 PM with 25 comments

  • by Syntaf on 5/14/24, 11:23 PM

    We've been using highlight for our bootstrapped member management platform[1] and I gotta say I'm super impressed with the session replay feature, it's really helpful for understanding user behavior at a fraction of the price of competitors.

    I remember wanting to use Heap's session replay only to release they want hundreds of dollars per _month_, my last bill with highlight was $2.38 I recall.

    That's all to say that I'm glad Highlight is figuring out how to scale while still offering their features to the small players of the world.

    [1] https://embolt.app

  • by jkercher on 5/15/24, 2:40 AM

    clickhouse-local is pretty slick as well. You can operate directly on text files as if they were tables. I made my own toy text file database thing and thought I was cool because I could outrun similar programs like q, textql, sqlite, etc. But clickhouse-local had me by a factor of 10 easy in every kind of query with every type of data. Those guys know stuff.
  • by Dachande663 on 5/15/24, 8:34 AM

    We found the "lots of small inserts" issue, and fixed it by just using the Buffered table engine[0]. Can create it as a replica of the destination table, and it stores inserts in memory until they cross a threshold and are written. Super simple and took 5 minutes.

    [0] https://clickhouse.com/docs/en/engines/table-engines/special...

  • by banditelol on 5/15/24, 2:32 AM

    > We opted to use the ClickHouse Kafka Connect Sink that implements batched writes and exactly-once semantics achieved through ClickHouse Keeper.

    Just a heads up, You've got repeated line there

  • by ople on 5/15/24, 7:37 AM

    Very interesting observations! Merge performance tuning seems often overlooked even though it's a key aspect of sustained ClickHouse performance.

    I also like that the blog is quite compact and gets the points across without getting too much into the weeds.

    One thing I've noticed also that bloom filter index types can be quite costly to merge. In many cases that's acceptable though due to the massive benefit they provide for text queries. One just has to be mindful of the overhead when adding them.

  • by JosephRedfern on 5/15/24, 12:13 PM

    Thanks for sharing! I'm curious as to your approach to changing the ORDER BY key for such large tables without significant downtime, since AFAIK this can't be done in place (see: https://kb.altinity.com/altinity-kb-schema-design/change-ord...). Are you able to share any details?
  • by misiek08 on 5/15/24, 11:20 AM

    What size is the cluster? Just curious how much hardware is needed to handle such traffic :)