by francoismassot on 1/15/24, 12:13 PM with 6 comments
by MrPowers on 1/17/24, 2:30 PM
Parquet files store metadata about row groups in the file footer. Delta Lake adds file-level metadata in the transaction log. So Delta Lake can perform file-level skipping before even opening any of the Parquet files to get the row-group metadata.
Delta Lake allows you to rearrange your data to improve file-skipping. You can Z Order by timestamp for time-series analyses.
Delta Lake also allows for schema evolution, so you can evolve the schema of your table over time.
This company may have a cool file format, but is it closed source? It seems like enterprises don't want to be locked into closed formats anymore.
by speedgoose on 1/17/24, 11:42 AM
Thanks but I will stay with Parquet for now.
by stargrazer on 1/15/24, 1:09 PM
On reading the reverse happens.
This becomes the compute/space conundrum: space is reduced with column based regularity, but time is increased due to the extra overhead of columnar compression.
by jononor on 1/18/24, 10:07 PM