from Hacker News

Ask HN: Why is parsing still inconvenient in 2023?

by substation13 on 4/5/23, 8:05 AM with 4 comments

When storing and sharing structured data, engineers typically encode it in an intermediate structure like JSON, YAML, XML, TOML, etc...

Often, these are often not a good fit for the problem at hand.

However, no one seems to be writing parsers for their own custom formats. If you did, it would certainly get a few strange looks in code review!

But why is this? Why is working with JSON etc. still so much easier than writing quick parsers?

Why haven't common parsing techniques been streamlined to the point where this is the easiest path?

by DemocracyFTW2 on 4/5/23, 1:41 PM
I don't believe in the premise of your idea. Most of the time stuff can be done using standard data structures—things like numbers, strings, lists and maps of those and so on. Intermediate structures like JSON are only needed for storage and transmission. Inasfar as the need for storage and transmission in turn necessitates the same recurrent task of data serialization, it totally makes sense to reuse tools that have been around for a long time and been optimized for most use cases—so why re-invent databases, JSON parsers and entire wire protocols when more or less optimized tools of the sort are already there? Google wrote protobufs as a replacement for JSON, and while it does offer things beyond JSON like data typing, the appraisal is not unanimous. It is also hard to replace a mature DBMS like Postgres or SQLite with a from-scratch, purpose-made solution without failing on every single useful metric like throughput, feature completeness and reliability.
Personally I too think parsing should be easier in this day and age; I believe Raku (Perl 6) has made meaningful strides in that direction. Other than that, I feel parsing is somewhat over- and lexing is somewhat underrated, if anything. In my experience lexing is really the step you want most to get data out of a byte sequence, and I agree that that should and could be much easier. FWIW JavaScript's RegExes recently obtained the 'sticky flag' which is ultra-beneficial for lexing. Not sure why that bit took so long.
by surprisetalk on 4/5/23, 1:26 PM
Most of the time, parsing isn't the problem you want to be solving.
People use JSON because they need to send information over a wire, and JSON serializers are abundant. I agree that this is problematic for a bunch of different reasons:
[1] https://taylor.town/json-considered-harmful
But note that this problem has been solved many, many times:
[2] https://en.wikipedia.org/wiki/Comparison_of_data-serializati...
Formats like MessagePack and Cap'n_Proto have a lot of nice properties.
Writing parsers is not easy. And it's especially not easy when you have a custom format that different people want to do different things with.
---
Btw, I've tried out pretty much every parsing library in Rust, Typescript, and Haskell.
Elm's parsing library is the only one I enjoy using:
[3] https://package.elm-lang.org/packages/elm/parser/latest
I think nearley.js has a cool interface but poor execution:
[4] https://nearley.js.org
by pestatije on 4/5/23, 8:47 AM
Parser generators have been around for a very long time...i think the difficulty is in defining the format, at which point it is easier to go the standard way