from Hacker News

Pg_lakehouse: Query Any Data Lake from Postgres

by landingunless on 5/13/24, 1:29 PM with 72 comments

  • by nathanwallace on 5/13/24, 8:57 PM

    Readers may also enjoy Steampipe [1], an open source tool to live query 140+ services with SQL (e.g. AWS, GitHub, CSV, Kubernetes, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc with other tables. (Disclaimer - I'm a lead on the project.)

    1 - https://github.com/turbot/steampipe

  • by whalesalad on 5/14/24, 2:53 AM

    How many folks here struggle to adopt tooling like this because it isn’t possible to add psql extensions to places like RDS?
  • by arduanika on 5/13/24, 9:21 PM

    The name seems to be an allusion to the author P.G. Wodehouse, creator of the character Jeeves.

    https://en.wikipedia.org/wiki/P._G._Wodehouse

    Very clever naming!

  • by ahachete on 5/14/24, 9:20 AM

    The (internal) use of DataFusion to create new, powerful extensions for Postgres is a very clever idea. Very good work for the ParadeDB team.

    I like this one very much. Very simple way to avoid having to use different set of tools and query languages (or more limited query languages) to query lakes.

  • by kiwicopple on 5/13/24, 7:38 PM

    Neat that you plan to support both Delta Lake and Apache Iceberg

    I'm curious about HN's position between these two formats? I'm having a hard time deciphering which might be the industry winner (or perhaps they both have a place, no "winner" necessary)

  • by tehlike on 5/13/24, 7:29 PM

    Paradedb is doing a lot of good work with postgres. Pg_analytics, and now pg_lakehouse...
  • by jeadie on 5/13/24, 9:45 PM

    This looks functionally similar as using http://github.com/spiceai/spiceai with a postgreSQL data accelerator.
  • by yrashk on 5/13/24, 7:36 PM

    As somebody who writes a lot of Postgres extensions, I can say this is quite interesting!

    I think I can see some parallels to Supabase's wrappers project.

    Keep up the good work!

  • by mcdonje on 5/13/24, 8:15 PM

    Looks like pg as a replacement for databricks sql, which is already a query engine for datalakes. It's not a lakehouse, but it calls itself one. Seems like a cool and useful project, but the name is problematic.
  • by nikita on 5/14/24, 1:16 AM

    I have another question. So far on the clickbench leaderboard it's 15x slower than baseline. The number 1 place is 1.67 slower the baseline.

    I assume that's DataFusion speed. What's the plan to improve upon it?

  • by nikita on 5/13/24, 9:01 PM

    This is great work! Could you please comment on the choice of your license. Lost Postgres extension that achieve wide adoption use Postgres, MIT or Apache license.
  • by mustafabal on 5/14/24, 2:03 AM

    Very nice addition! Do you plan to support Snowflake as an object store in the near future? It's not currently in pg_lakehouse's README.
  • by tarasglek on 5/14/24, 6:01 AM

    I am not up to date in various lakes. Is this read-only? Are you able to init a lake from scratch?

    What's the model to feed such a lake from some queue?

  • by epsilonic on 5/13/24, 11:13 PM

    How does this compare to Hydra? https://www.hydra.so/
  • by sdairs on 5/13/24, 8:04 PM

    Very cool!

    Could you share the key difference between this and the previous pg_analytics, and motivation of making it a separate plugin?

  • by samber on 5/13/24, 8:22 PM

    It seems very promising!

    2 questions:

    - do you distribute query processing over multiple pg nodes ?

    - do you store the metadata in PG, instead of a traditional metastore?

  • by brunoqc on 5/13/24, 7:46 PM

    Nice. I wish timescaledb open-sourced their s3 storage thing.
  • by hardwaresofton on 5/14/24, 3:02 AM

    Yet another amazing postgres plugin made possible by pgrx (https://github.com/pgcentralfoundation/pgrx)

    It's really crazy how some projects just instantly enable a whole generation of new possibilities.

    If you are impressed like this and want to build something like it -- check out pgrx, it's a pretty great experience.

  • by q9tE6uHb7yKq on 5/14/24, 1:34 AM

    looks interesting!