from Hacker News

Ask HN: Storing Images in PostgreSQL vs. Object Storage for Large Datasets

by atif089 on 2/28/24, 8:29 PM with 3 comments

Is it advisable to store images directly in PostgreSQL for a dataset of 100 million records, each with a 200KB image, or should I use object storage with references from the start? My primary and only use case involves creating multimodal embeddings for search and relevance purposes.
  • by throwaway38375 on 2/29/24, 10:31 AM

    If you are storing 100 million images at 200KB each, that comes out at 20TB!

    I would calculate the costs of something like S3 versus buying five 4TB HDDs and running a network file server.

    You're going to save a ton of money hosting this yourself. I would go with two used powerful desktop PCs. One as a DB server and the other as the file server.

    Store the images on the file server and store the image's path in the database server.

  • by speedgoose on 2/29/24, 5:48 AM

    You could do quick tests using bytea, toast, or large objects.

    But an object store may be more convenient overall.

    When I did something similar, I did store embeddings and the image UUID in a table and my images in an object store with the same UUIDs as filenames. It was simpler to upload the images and put them available through a CDN.

  • by reactor on 3/2/24, 9:25 AM

    Use something like SeaweedFS or Minio etc.