by mccanne on 4/26/22, 1:02 PM with 223 comments
by mccanne on 4/26/22, 4:24 PM
I've learned a lot from your comments and pointers.
The Zed project is broader than "a jq alternative" and my bad for trying out this initial positioning. I do know there are a lot of people out there who find jq really confusing, but it's clear if you become an expert, my arguments don't hold water.
We've had great feedback from many of our users who are really productive with the blend of search, analytics, and data discovery in the Zed language, and who find manipulating eclectic data in the ZNG format to be really easy.
Anyway, we'll write more about these other aspects of the Zed project in the coming weeks and months, and in the meantime, if you find any of this intriguing and want to kick the tires, feel free to hop on our slack with questions/feedback or file GitHub issues if you have ideas for improvements or find bugs.
Thanks a million!
https://github.com/brimdata/zed https://www.brimdata.io/join-slack/
by weinzierl on 4/26/22, 1:53 PM
* jq (a great JSON-wrangling tool)
* jc (convert various tools’ output into JSON)
* jo (create JSON objects)
* yq (like jq, but for YAML)
* fq (like jq, but for binary)
* htmlq (like jq, but for HTML)
List shamelessly stolen from Julia Evans[1]. For live links see her page.
Just a few days ago I needed to quickly extract all JWT token expiration dates from a network capture. This is what I came up with:
fq 'grep("Authorization: Bearer.*" ) | print' server.pcap | grep -o 'ey.*$' | sort | uniq | \
jq -R '[split(".") | select(length > 0) | .[0],.[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson]' | \
jq '.[1]' | jq '.exp' | xargs -n1 -I! date '+%Y-%m-%d %H:%M:%S' -d @!
It's not a beauty but I find the fact that you can do it in one line, with proper parsing and no regex trickery, remarkable.[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...
by msluyter on 4/26/22, 1:53 PM
by psacawa on 4/26/22, 1:48 PM
The touted claim that is fundamentally stateless is not true. jq is also stateful in the sense that it has variables. If you want, you can write regular procedural code this way. Some examples [1]
The real problem of jq is that it is currently lacking a maintainer to assess a number of PRs that have accumulated since 2018.
[0] https://github.com/stedolan/jq/wiki/jq-Language-Description
[1] https://github.com/fadado/JBOL/blob/master/fadado.github.io/...
by eatonphil on 4/26/22, 1:53 PM
Incidentally there are many tools that help you do this like dsq [0] (which I develop), q [1], textql [2], etc.
[0] https://github.com/multiprocessio/dsq
by algesten on 4/26/22, 1:46 PM
To me they look similarly complicated and the examples stresses certain aggregation operations that are harder to do in jq (due to it being stateless).
by knome on 4/26/22, 2:15 PM
I can see where jq might confuse someone new to it, but their replacement is irregular, stateful, still difficult, and I don't even see variable binding or anything.
jq requires you to understand that `hello|world` will run world for each hello, passing the world out values to either the next piped expression, the wrapping value-collecting list, or printing them to stdout.
it's a bit unintuitive if you come in thinking of them as regular pipelines, but it's a constant in the language that once learned always applies.
this zed thing has what appears to be a series of workarounds for its own awkwardness, where they kept tacking on new forms to try to bandaid those that came before.
additionally, since they made attribute selectors barewords where jq would require a preceding reference to a variable or the current value (.), I'm not sure where they'll go for variables should they add them.
by micimize on 4/26/22, 3:47 PM
expand_vals_into_independent_records='
.name as $name | .vals[] | { name: $name, val: . }
'
echo '{"name":"foo","vals":[1,2,3]} {"name":"bar","vals":[4,5]}' |
jq "$expand_vals_into_independent_records"
Also, generally, not a fan of the tone of this article.by diehunde on 4/26/22, 2:23 PM
by brushfoot on 4/26/22, 1:47 PM
Get-Content cars.json | ConvertFrom-Json | ? { $_.color -eq 'red' }
The beauty of this is that the query syntax applies not just to JSON but to every type of collection, so you don't have to learn a specific syntax for JSON and another for another data type. You can use Get-Process on Linux to get running processes and filter them in the same way. The same for files, HTML tags, etc. I think nushell is doing something similar, though I haven't tried it yet.I prefer this approach to another domain-specific language, as interesting as jq's and zq's are.
by AcerbicZero on 4/26/22, 3:30 PM
Sometimes jq -r '.[]' works, but its all just trial and error. I use plenty of jq in my scripts, but I can never seem to visualize how jq looks at the data. I just have to toss variations of '.[whateveriwant].whatever[.want.]' until something works....I suppose the root of my complaint is that jq does not do a good job of teaching you to use jq. It either works, or gives you nothing, and while I've learned to work around that, I'll try anything that claims to be even 1% better than jq.
by abledon on 4/26/22, 2:11 PM
which follows the jmespath standard
by hbbio on 4/26/22, 2:38 PM
Or rather the pure Go rewrite https://github.com/itchyny/gojq which is a better faster implementation, with bugs fixed
by politelemon on 4/26/22, 6:14 PM
Please do not recommend HomeBrew for Linux. A binary download is safer compared to how HomeBrew clobbers a Linux machine. If you do not wish to use a Linux package manager, simply point at the binary download. It is much safer and less intrusive.
by sfink on 4/26/22, 9:43 PM
I wrote a tool to do this -- https://github.com/hotsphink/sfink-tools/blob/master/bin/jso... -- but I do not recommend it to anyone other than as perhaps a source of inspiration. It's slow and buggy, the syntax is cryptic and just matches whatever I came up with when I had a new need, etc. It probably wouldn't exist if I had heard of jq sooner.
But for what it does, it's awesome. I can do things like:
% json somefile.json
> ls
0/
1/
2/
> cd 0
> ls
info/
files/
timings/
version
> cat version
1.2b
> cat timings/*/mean
timings/firstPaint/mean = 51
timings/loadEventEnd/mean = 103
timings/timeToContentfulPaint/mean = 68
timings/timeToDomContentFlushed/mean = 67
timings/timeToFirstInteractive/mean = 658
timings/ttfb/mean = 6
There are commands for searching, modifying data, aggregating, etc., but those would be better done in a more principled, full-featured syntax like jq's.I see ijq, and it looks really nice. But it doesn't have the context and restriction of focus that I'm looking for.
by lichtenberger on 4/26/22, 3:21 PM
The language itself borrows a lot of concepts from functional languages as higher order functions, closures... you can also develop modules with functions for easy reuse...
A simple join for instance looks like this:
let $stores :=
[
{ "store number" : 1, "state" : "MA" },
{ "store number" : 2, "state" : "MA" },
{ "store number" : 3, "state" : "CA" },
{ "store number" : 4, "state" : "CA" }
]
let $sales := [
{ "product" : "broiler", "store number" : 1, "quantity" : 20 },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }
]
let $join :=
for $store in $stores, $sale in $sales
where $store=>"store number" = $sale=>"store number"
return {
"nb" : $store=>"store number",
"state" : $store=>state,
"sold" : $sale=>product
}
return [$join]
Of course you can also group by, count, order by, nest FLWOR clauses...by arwineap on 4/26/22, 1:49 PM
by cosmiccatnap on 4/26/22, 1:58 PM
by xg15 on 4/26/22, 8:21 PM
The post links to the tutorial "An Introduction to JQ" at [1].
Somewhere inside the tutorial, array operators are introduced like this:
> jq lets you select the whole array [], a specific element [3], or ranges [2:5] and combine these with the object index if needed.
This is not supposed to be criticism on this particular tutorial (I've seen this kind of description quite often), but I could imagine this to be a typical "eyes glaze over" moment, where people subtly lose track of what is happening.
It appears to make sense on first glance, but leaves open the question what "selecting the whole array" actually means - especially, since you can write both ".myarray" and ".myarray[]" and both will select the whole array in a sense.
I think this is the point where one would really need to learn about sequences and about jq's processing model to not get frustrated later.
by knowsuchagency on 4/26/22, 5:46 PM
by qmacro on 5/3/22, 11:30 AM
One thing that seems to be perhaps a misconception amongst some is that jq invocations are short and only 'one-liners', and that a 'real script' (in a 'real language') would be better in many cases. I think this lack of larger program examples probably helps to perpetuate this misunderstanding too.
Anyway, I was inspired enough by the article in question to write up some of my own thoughts on jq and statelessness: https://qmacro.org/blog/posts/2022/05/02/some-thoughts-on-jq...
by 29athrowaway on 4/26/22, 4:26 PM
I also have never seen jq as a performance bottleneck.
jq is stable, I have never encountered a bug with it and I have never seen it getting stuck after years of usage. It is dependable and practical.
jq has helped me put out countless fires throughout my career. I should donate to it one day.
by pm90 on 4/26/22, 3:55 PM
I do like tools that complement/supplement jq though, like jid: https://github.com/simeji/jid
by ilyash on 4/26/22, 2:53 PM
by eru on 4/26/22, 3:00 PM
Why would you change that?
by gcmeplz on 4/26/22, 2:56 PM
- https://github.com/thisredone/rb is a widely used ruby version of this idea
- https://github.com/KelWill/nq#readme is something similar that I wrote for my own use
by kaliszad on 4/27/22, 7:34 AM
by ducaale on 4/26/22, 4:54 PM
[1] https://github.com/antonmedv/fx
[2] https://twitter.com/antonmedv/status/1515429017582809090
by phibz on 4/27/22, 1:04 AM
echo '1 2 3' | jq ....
as creating three separate json documents, each with a single number as their top level "document" , body, or content.
So of course you can't sum them. They are fed as separate documents to the jq pipeline as if you processed three separate jq commands.
Perhaps by stateless you mean no mutuable global state? But it certainly maintains state from the location in the input document to the output of each selector/functor.
IMO it helps if you have a background in some of the concepts of functional programming.
by ilyash on 4/26/22, 3:17 PM
good_data = fetch("openlibrary.json").docs.filter({"author_name": Arr, "publish_year":Arr})
good_data.map({{"title": A.title, "author_name": A.author_name[0], "publish_year": A.publish_year[0]}}).group("author_name").mapv(len).sortv((>=)).limit(3)
by taude on 4/26/22, 4:06 PM
[1] https://github.com/dflemstr/rq [2] https://news.ycombinator.com/item?id=13090604
by gzapp on 4/26/22, 5:44 PM
Also prob not the first to create a project for personal use that just wraps evals in another language haha: https://www.npmjs.com/package/jsling
by bradwood on 4/26/22, 6:42 PM
That plus good old fashioned sed/grep/awk give me everything I need to do on the cli.
If I want more, it's python or node.
by quotemstr on 4/26/22, 2:35 PM
by henrydark on 4/26/22, 9:59 PM
Zq looks cool, but the fact that this piece doesn't contain a single instance of the word "map" tells me the authors still haven't gotten jq. Especially with the running strawman example of adding numbers.
by stblack on 4/26/22, 3:21 PM
I feel the author makes his case clearly, then presents an alternative. Underneath all this is a ton of work, for which I applaud OP.
It may not scratch your particular itch, but come on!
Being an ass on HN is a choice. It happens far too often, and I wish everyone would just dial it back.
by dimensionc132 on 4/26/22, 2:20 PM
The question for is this; can I do with json files what i can do with Python using Zq?
by jrm4 on 4/26/22, 4:24 PM
Seems to me that if you're in a shell, then you should be "shell-like." There should not be much of a learning curve at all, and when in doubt, try to behave like other shell tools, in a Unix way way. Make pipe behavior generally predictable, especially for those who aren't deep into json et al.
And if you're not going to do that, say so on "the box?"
(Disclaimer, it could be that I'm an idiot when it comes to all of this and I'm missing something big. Kind of feels that way, and I welcome correction)
by caymanjim on 4/27/22, 2:12 AM
I almost gave up before I got to the first mention of zq, and then wished I had.
by pygar on 4/26/22, 11:05 PM
by tus666 on 4/26/22, 6:00 PM
by spiralx on 4/26/22, 11:44 PM
https://docs.microsoft.com/en-us/archive/msdn-magazine/2003/...
Anyway, I've installed ZQ and will look to use it, even my simple usage of JQ had already led to thoughts of writing my own, better version :)
Quick bug report: On the Aggregate Functions page the link to _countdistinct_ goes to the page for _count_, and there actually isn't a page at https://zed.brimdata.io/docs/language/aggregates/countdistin....
by kryptozinc on 4/26/22, 5:41 PM
by harbor11012 on 4/26/22, 4:42 PM
by tzury on 4/26/22, 7:40 PM
by Aeolun on 4/27/22, 1:49 AM
by xg15 on 4/26/22, 8:52 PM
I think jq has a pretty elegant data model, but the syntax is often very clunky to work with.
So here is a half thought-out idea how you might improve the syntax for the "stateful operations" usecase the OP outlined:
I think it's not quite true that different elements of a sequence can never interact. The OP mentioned reduce/foreach, but it's also what any function that takes argument does:
If you have an expression 'foo | bar', then bar is called once for every element foo emits. However, foo could also a function that takes arguments. Then you can specify bar as an argument of foo like this: 'foo(bar)'. In this situation, execution of bar is completely controlled by foo. In particular, foo gets to see all elements that foo emits, not just one each. I believe this is how e.g. [x] can collect all elements of x into an array.
In the same way, you could write a function 'add_all(x)' which calls x and adds up all emitted elements to a sum.
However, this wouldn't help you with collecting all input lines, as there is nothing for you function to "wrap around". Or at least, there used to be nothing, but I think in one of the recent build, an "inputs" function was added, which emits all remaining inputs. So now, you can write e.g. '[., inputs]' to reimplement slurp. In the same way, you could sum up all input lines by writing 'add_all(., inputs)'.
However, this is still ugly and unintuitive to write, so I think introducting some syntactic sugar for this would be useful. E.g., you could imagine a "collect operator", e.g. '>>' which treats everything left of it as the first argument to the function to the right of it.
e.g., writing 'a >> b' would desugar to 'b(a)'.
Writing 'a | b >> c' would desugar to 'c(a | b)'.
Any steps further to the right are not affected:
'a | b >> c | d' would desugar to 'c(a | b) | d'.
Scope to the left could be controlled with parantheses:
'a | (b >> c)' would desugar to 'a | c(b)'.
To make this more useful for aggregating on input lines, you could add a special rule that, if the operator is used with no parantheses, it will implicitly prepend '(., inputs)' as the first step.
So if the entire top-level expression is 'a | b >> c', it would desugar to 'c((., inputs) | a | b)'.
This would make many usecases that require keeping state much more straight-forward. E.g. collecting all the "baz" fields into an array could be written as '.baz >> []' which would desugar to '[(., inputs) | .baz]'
Summing up all the bazzes could be written as '.baz >> add_all' which would desugar to 'add_all((., inputs) | .baz)'
...and so on.
On the other hand, this could also lead to new confusion, as you could also write stuff like '... | (.baz >> map) | ...' which would really mean 'map(.baz)' or 'foo >> bar >> baz' which would desugar to the extremely cryptic expression 'baz((., inputs) | bar((., inputs) | foo))'. So I'm not quite sure.
Any thoughts about the idea?
by marmada on 4/26/22, 8:08 PM
The purpose of life is not to know JQ. I just want to process the JSON so I can move on and do whatever is actually important. Ideally, I'd just be able to tell GPT-codex to do what I want to do to the JSON in English.
We're not there yet, but in the meantime if there's another tool that allows me to know less in exchange for doing more, I'll gladly use it.
by ctur on 4/26/22, 1:41 PM
jq had a tough learning curve so you should switch to zq which is a (closed source?) wrapper around an obscure language you’ve never heard of that we promise is easier because reasons. Also coincidentally it’s the language of an ecosystem we were funded to build.
Edit: mea culpa, turns out you can download the source (revealed half way through the article).