by mccanne on 5/17/22, 2:19 PM with 46 comments
by simonw on 5/17/22, 9:52 PM
I've been experimenting with this approach against SQLite for a few years now, and I really like it.
My sqlite-utils package does exactly this. Try running this on the command line:
brew install sqlite-utils
echo '[
{"id": 1, "name": "Cleo"},
{"id": 2, "name": "Azy", "age": 1.5}
]' | sqlite-utils insert /tmp/demo.db creatures - --pk id
sqlite-utils schema /tmp/demo.db
It outputs the generated schema: CREATE TABLE [creatures] (
[id] INTEGER PRIMARY KEY,
[name] TEXT,
[age] FLOAT
);
When you insert more data you can use the --alter flag to have it automatically create any missing columns.Full documentation here: https://sqlite-utils.datasette.io/en/stable/cli.html#inserti...
It's also available as a Python library: https://sqlite-utils.datasette.io/en/stable/python-api.html
by mamcx on 5/17/22, 4:07 PM
Despite the claims, SQL is NOT "schema-fixed".
You can 100% create new schemas, alter them and modify them.
What actual happens is that if you have a CENTRAL repository of data (aka "source of truth"), then you bet you wanna "freeze" your schemas (because is like a API, where you need to fulfill contracts).
--
SQL have limitations in lack of composability, the biggest reason "NoSQL" work is this: A JSON is composable. A "stringy" SQL is not. If SQL were really around "relations, tupes" like (stealing from my project, TablaM):
[Customer id:i32, name:Str; 1, "Jhon"]
then developers will have less reason to go elsewhere.by CharlesW on 5/17/22, 4:49 PM
Instead of those words I'd suggest something like "schema on write" vs. "schema on read", or "persisted structured" vs. "persisted unstructured". "Document" vs. "relational" doesn't quite capture it, since unstructured data can have late-binding relations applied at read time, and structured data doesn't have to be relational.
And of course, modern relational databases can store unstructured data as easily as structured data.
by anentropic on 5/17/22, 6:21 PM
Eventually we get to the meat:
> For example, the JSON value
{"s":"foo","a":[1,"bar"]}
> would traditionally be called “schema-less” and in fact is said have the vague type “object” in the world of JavaScript or “dict” in the world of Python. However, the super-structured interpretation of this value’s type is instead:> type record with field s of type string and field a of type array of type union of types integer and string
> We call the former style of typing a “shallow” type system and the latter style of typing a “deep” type system. The hierarchy of a shallow-typed value must be traversed to determine its structure whereas the structure of a deeply-typed value is determined directly from its type.
This is a bit confusing, since JSON data commonly has an implicit schema, or "deep type system" as this post calls it, and if you consume data in any statically-typed language you will materialise the implicit "deep" types in your host language.
So it seems that ZSON is sort of like a TypeScript-ified version of JSON, where the implicit types are made explicit.
It seems the point is not to have an external schema that documents must comply to, so I guess at the end of the day has similar aim to other "self-describing" message formats like https://amzn.github.io/ion-docs/ ? i.e. each message has its own schema
So the interesting part is perhaps the new data tools to work with large collections of self-describing messages?
by troelsSteegin on 5/17/22, 2:59 PM
[0] https://zed.brimdata.io/docs/language/overview/ [1] https://docs.confluent.io/platform/current/schema-registry/i...
by kmerroll on 5/17/22, 8:27 PM
Suggest looking into JSON-LD which was intended to solve many of the type and validation use-cases related to type and schema.
by difflens on 5/17/22, 3:31 PM
by tabtab on 5/18/22, 7:31 PM
by natemcintosh on 5/17/22, 7:18 PM
And it seems like the newer "zed lake" format is like a large blob managed by a server. Can you also convert data to and from and the file formats to the lake format? What is the lake's main use case?
by bthomas on 5/17/22, 4:45 PM
> EdgeDB is essentially a new data silo whose type system cannot be used to serialize data external to the system.
I think this implies that serializing external data to zson is easier than writing an INSERT into edgedb, but not sure why that would be.
by munro on 5/17/22, 7:36 PM
by thinkharderdev on 5/17/22, 7:20 PM
by SPBS on 5/17/22, 4:47 PM
by ccleve on 5/17/22, 3:12 PM
Ok, fine. But I'm not sure how this helps if you have six different systems with six different definitions of a customer, and more importantly, different relationships between customers and other objects like orders or transactions or locations or communications.
I don't see their approach as ground-breaking, but it is definitely worthy of discussion.
by loquisgon on 5/17/22, 4:10 PM
by feoren on 5/17/22, 10:24 PM
Anyway this article is crap and gets everything wrong, just like all of you do. Whatever, nothing to see here I guess.