by pauldix on 5/28/24, 7:28 PM with 7 comments
by appplication on 5/29/24, 1:22 PM
One thing I have wondered: would it make sense to reduce file size? Generally advice I’ve seen is to keep files to around 250mb-1gb, but if you’re leaning heavily on bloom filters it feels like it could make sense to reduce the number of files to reduce the amount that would trigger the per-file filter.
by darkflame91 on 5/29/24, 5:30 AM
With large datasets, wouldn't partitioning the data on low cardinality columns give the same benefit without the space overhead?