from Hacker News

Show HN: pg-bulk-ingest – Now with multi-table support

by michalc on 2/11/24, 6:58 AM with 6 comments

I (with others) made a Python utility for loading data into PostgreSQL incrementally from ETL-ish pipelines. Its API supported ingesting into multiple multiple tables at the sort of "structural" level from its beginning (more or less), but didn't actually support it if you tried to do it. And I've been umming and ahhing on how best to do it. No way seemed perfect...

... but pushed by an actual use case, I finally made a decision and did it.

  • by mortallywounded on 2/12/24, 5:38 PM

    I have always used pg_bulkload. It's a bit of pain to compile but the tool is really fast... I love how you can define what to do with constraint conflicts, etc.

    Recently I used it to bulk import a billion rows and dedupe by a single column/constraint by throwing out the rows that conflicted. It did it-- in like two hours.

    see: https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

  • by mrAssHat on 2/12/24, 12:21 PM

    Thanks for sharing your tool!

    I think it would be best to illustrate the readme with a problem it tries to solve: currently I don't understand why a simple set of INSERT operations wouldn't suffice.

    Also, the phrase "ingest data into Postgres" sounds wrong: it would be postgres, not your tool that would ingest data (if it could), your tool should be described as the one "putting data into Postgres". And thus you probably have named your tool wrong...

  • by _boffin_ on 2/12/24, 9:57 PM

    Anyone use json_to_recordset? I’ve used it in the past to insert a few billion rows and was pretty happy with it