by mulander on 7/18/23, 6:22 PM with 45 comments
by mslot on 7/18/23, 7:32 PM
The high-level is: You enable a setting and every CREATE SCHEMA creates a new shard. All the tables in the schema will be co-located so you can have efficient joins & foreign keys between the tables.
On top of that, you can also have reference tables that are replicated to all nodes, again for fast joins & foreign keys with all schemas.
Everything else is about making every PostgreSQL feature work as seamlessly as if there was no sharding. You can still do things like transactions across schemas, create and use custom types, access controls, work with other extensions, use procedures, etc.
by metadat on 7/18/23, 7:46 PM
The promise behind this approach to DB sharding has great potential. Simultaneously impressive, novel, and badass.
I wish this had been available ten years ago at a few of my startups!
by pickledish on 7/18/23, 7:21 PM
My definition is "the columns and column types of a table", but, that doesn't seem to make sense with what they're talking about here ("large" and "small" schemas probably aren't referring to wide and narrow tables for example, and I don't see how sharding by my definition of "schema" could even make sense anyways)
by skunkworker on 7/18/23, 10:26 PM
by stevefan1999 on 7/18/23, 11:33 PM
by asah on 7/18/23, 7:40 PM
one gotcha: schemas are a weird old thing in SQL that are kinda the worst of all worlds, basically more like prefixing your SQL object names than a real level of indirection. Schema objects can't be manipulated as a batch, they don't provide the isolation / access control of databases, and can't be manipulated in DML and require stored procedures to manipulate.
by binwiederhier on 7/20/23, 4:05 AM
Back when it happened we tried to buy their product and they were not sure if the Citus standalone product was even going to exist, and they refused to demo it even. Odd timing possibly, but it's a data point.
On the positive side of things, the shard rebalancing was not open source back then iirc, which made the open source version pretty useless. Now it seems to be open source: https://www.citusdata.com/product/comparison -- pretty cool.
I'd still be careful to bank on it as a Citus only customer or open source user.
by mariocesar on 7/18/23, 8:48 PM
by __s on 7/18/23, 10:55 PM
Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use case
Couldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every row
Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function
by potamic on 7/19/23, 6:30 PM