from Hacker News

Using Parquet's Bloom Filters

by pauldix on 5/28/24, 7:28 PM with 7 comments

  • by appplication on 5/29/24, 1:22 PM

    One thing I have wondered: would it make sense to reduce file size? Generally advice I’ve seen is to keep files to around 250mb-1gb, but if you’re leaning heavily on bloom filters it feels like it could make sense to reduce the number of files to reduce the amount that would trigger the per-file filter.
  • by darkflame91 on 5/29/24, 5:30 AM

    With large datasets, wouldn't partitioning the data on low cardinality columns give the same benefit without the space overhead?