from Hacker News

Show HN: Pg_jsonschema – A Postgres extension for JSON validation

by oliverrice on 7/21/22, 2:31 PM with 56 comments

pg_jsonschema is a solution we're exploring to allow enforcing more structure on json and jsonb typed postgres columns.

We initially wrote the extension as an excuse to play with pgx, the rust framework for writing postgres extensions. That let us lean on existing rust libs for validation (jsonschema), so the extension's implementation is only 10 lines of code :)

https://github.com/supabase/pg_jsonschema/blob/fb7ab09bf6050...

happy to answer any questions!

by hardwaresofton on 7/21/22, 3:47 PM
This is awesome -- really excited that Supabase is picking this up with their commitment to open source in general and PG in particular.
Some prior art:
- https://github.com/gavinwahl/postgres-json-schema (mentioned in the repo)
- https://github.com/furstenheim/is_jsonb_valid
pgx[0] is going to be pretty revolutionary for the postgres ecosystem I think -- there is so much functionality that would benefit from happening inside the database and I can't think of a language I want to use at the DB level more than Rust.
[0]: https://github.com/tcdi/pgx
by dmitry_dygalo on 7/22/22, 8:54 AM
The `jsonschema` crate author here.
First of all, this is an exciting use case, I didn't even anticipate it when started `jsonschema` (it was my excuse to play with Rust). I am extremely pleased to see such a Postgres extension :)
At the moment it supports Drafts 4, 6, and 7 + partially supports Draft 2019-09 and 2020-12. It would be really cool if we can collaborate on finishing support for these partially supported drafts! What do you think?
If you'll have any bug reports on the validation part, feel free to report them to our issue tracker - https://github.com/Stranger6667/jsonschema-rs/issues.
Re: performance - there are a couple of tricks I've been working on, so if anybody is interested in speeding this up, feel free to join here - https://github.com/Stranger6667/jsonschema-rs/pull/373
P.S. As for the "Prior Art" section, I think that https://github.com/jefbarn/pgx_json_schema should be mentioned there, as it is also based on `pgx` and `jsonschema`.
by rubenfiszel on 7/21/22, 6:43 PM
This is absolutely brilliant.
In windmill, https://github.com/windmill-labs/windmill (self-hostable AWS Lambda, OSS AGPLv3) we infer the jsonschema of your script by doing static analysis but so far we were not doing validation of the payload itself, if your script failed because of incorrect payload that was your problem. Now without any additional effort I will be able to add validation and great error reporting "for free".
by paulmd on 7/21/22, 4:30 PM
Nice work, however, I am structurally dubious of putting too much functionality onto a classical centralized RDBMS since it can't be scaled out if performance becomes a problem. It's CPU load and it's tying up a connection which is a large memory load (as implemented in postgres, connections are "expensive") and since this occurs inside a transaction it's holding locks/etc as well. I know it's all compiled native code so it's about as fast as it can be, but, it's just a question of whether it's the right place to do that as a general concern.
I'd strongly prefer to have the application layer do generic json-schema validation since you can spawn arbitrary containers to spread the load. Obviously some things are unavoidable if you want to maintain foreign-key constraints or db-level check constraints/etc but people frown on check constraints sometimes as well. Semantic validity should be checked before it gets to the DB.
I was exploring a project with JSON generation views inside the database for coupling the DB directly to SOLR for direct data import, and while it worked fine (and performed fine with toy problems) that was just always my concern... even there where it's not holding write locks/etc, how much harder are you hitting the DB for stuff that, ultimately, can. be done slower but more scalably in an application container?
YAGNI, I know, cross the bridge when it comes, butjust as a blanket architectural concern that's not really where it belongs imo.
In my case at least, probably it's something that could be pushed off to followers in a leader-follower cluster as a kind of read replica, but I dunno if that's how it's implemented or not. "Read replicas" are something that are a lot more fleshed out in Citus, Enterprise, and the other commercial offerings built on raw Postgres iirc.
by banashark on 7/21/22, 3:21 PM
Very cool!
I remember when kicking the tires on postgrest/postgraphile that I found validation and error handling to be one of the less intuitive areas. Not the actual field-level constraints, but how to adapt it to fit a fast-fail vs slow-fail model.
When I had attempted before, the only ergonomic option was fast-fail (the first check constraint violated would bubble the error upward) rather than slow-fail (collect all invalid fields and return the collection of errors, which IME is more common on average web forms or api requests).
Looking at the single code file and tests, I see only singular field errors. Has a more ergonomic approach to validation-error collection been developed other than writing a large function to iterate the new record fieldwise against the schema?
by BrandiATMuhkuh on 7/21/22, 5:06 PM
This is really cool. This will make it much simpler to convert from firestore to supabase. I think that's the last missing feature that firestore provided which supabase didn't.
We are already running a sync process between firestore and postgres. So we can do aggregations on JSON data. At this point it's only a matter of time before we move to superbase
by mdaniel on 7/21/22, 4:00 PM
It would be valuable to know which JSON-Schema it supports, since there are currently 4 different versions that differ in their capabilities (as one might expect). Related to that, does it honor the "$schema" key allowing the schema to declare which version it is?
The postgres-json-schema alternative that's mentioned in the repo also ships with what appears to be a conformance test suite; does this carry the same, or was the focus more on speed?
by edmundsauto on 7/21/22, 7:29 PM
I have a very dumb question: why would you use this instead of a traditional schema? I thought the value of json columns was to be partially schemaless and flexible
What is the use case for this versus normal column definitions, if you’re looking to enforce schemas?
by dividuum on 7/21/22, 5:59 PM
Not experienced with Postgres and its ecosystem unfortunately, but all those Postgres extensions popping up on hn lately certainly make me envious. To someone with more insight: How risky is it to rely on those extensions? I guess rust handles the 'accidental data corruption or crashes' aspect. How difficult is it to continue to use such an extension once the original author walks away? Is the extension API somewhat (or perfectly?) stable? Given that this example probably mostly used in CHECK contraints, I guess it could be fairly easy removed or replaced from a running installation?
by cpursley on 7/21/22, 5:36 PM
I just love the work both Supabase & Hasura have done making people aware of how powerful Postgres is.
by shroompasta on 7/21/22, 10:45 PM
How does this validate data with a variable amount of keys with the same value type for example a to-do list
my day to day to do list varies in the number of tasks, but the completion will always be in boolean
```
    [
        {
         "task": "do Foo", 
         "completed": False, 
        }, 
        {
         "task": "do Bar", 
         "completed": False, 
        }, 
        {
         "task": "do Baz", 
         "completed": False, 
        }, 
        ...
    ]
```
Also, what is the issue of schema validation before inserting into the json column, as this is what I'm doing with a small microservice with Redis.
by tsujp on 7/22/22, 4:23 AM
This is awesome! I remember playing with Hasura (which uses Postgres) a few years ago. Hasura is great but one of the things I really wanted was JSON schema validation at a database level so I could keep all the "raw" types together if that makes sense. I was then, and do now, do public-facing schema validation but that doesn't necessarily validate how it's going to be persisted into the database so being able to have that closer to the database level now is great.
by mekoka on 7/21/22, 7:48 PM
As an aside, I'm a long time backend developer writing my first mobile app with Dart/Flutter. I tried the popular backend recommendation in that ecosystem. After weeks wasted on it, I googled "{PopularBackend} alternatives" out of frustration and this thing called "Supabase" appeared in the results. What a breath of fresh air it's been. It uses Postgres as a backend (with postgREST), which means I can put all those skills to good use (you can go far with all the batteries Postgres comes equipped with: row-level security, triggers, functions, etc). It's open source and the free tier lets me do pretty much all I need for development (and probably for a production MVP). Which means I don't have to worry about "after questions" until after.
Supabase team keep doing what you're doing!
by klysm on 7/22/22, 8:17 PM
I was just looking into this for handling typescript integration with postgres. I think there's a lot of opportunity to make that work really well. Zapatos and pg-typegen are good steps but it could be even better.
by warmwaffles on 7/21/22, 3:35 PM
If adding this check to an existing table with millions of records, will it scan the entire table checking all of the records, or just check when the records are inserted or updated.
by pella on 7/21/22, 3:24 PM
Thank you!
Some suggestion for the next roadmap:
- a Dockerfile ( The dockerfile helps me a lot in trying out new technologies )
- info about the compatibility with new PG15
- CI/CD
by legalcorrection on 7/21/22, 3:40 PM
So we’ve come full circle and now JSON is just XML with lighter syntax.