by hyzyla on 9/29/23, 9:07 PM with 26 comments
by svat on 10/2/23, 8:15 AM
It is also inspiring to me, because I have had the same idea and been working on something like this on-and-off since early 2022, but the contrast between your project and the state of mine[1] is like a textbook example of how to ship vs how not to ship:
• Instead of using an existing parser like pdf.js as you do, I started writing my own parser from scratch, in the process learning Rust and its Nom library for parsing, its integration with Webassembly, etc.
• I wrote not just a straightforward parser, but a crazy one that that preserves all the details like whitespace etc (what a typical parser is supposed to ignore), so that I can test whether it round-trips successfully.
• After I got it working, I didn't stop at "works on almost all PDFs in practice" (the same as with PDF.js or any other PDF implementation) but actually chased down and investigated every single failure, checking whether they work in any other PDF application/library (Preview, Chrome, qpdf, Adobe Reader, etc), until I could prove to my satisfaction that it's not a fault in the parser. (This is still not complete…)
• When I returned to this project again after several months, instead of making further progress I spent time starting to document the code, making minor improvements and tweaks, etc.
So the end result is that my project basically does nothing still, while you have a working PDF debugger. :) This is the difference between a project that intends to actually produce something and one that ends up being mostly for learning/fun with the goal mostly forgotten… not that I have any regrets :)
[Meta: Something similar is true of this comment too, which I started two days ago but left as a draft… until I finally had a burst of energy and posted just now.]
Returning to your project, a couple of feature requests:
- Provide a shortcut to jump directly to the node for page N, for any user-provided page number N.
- (Where possible) Some annotation of the page content stream operators — the Tj, Td, etc.
(Do consider making it an open-source project, whatever the quality of the code…)
[1]: https://github.com/shreevatsa/pdf-explorer / https://shreevatsa.net/pdf-explorer/
by Uptrenda on 9/30/23, 12:45 PM
When I was trying to improve my resume I added some custom Javascript to the PDF using Adobe Reader and what I learned is even Adobe's product makes it painful. Basically the process was something like this:
1. You add a script that loads at certain sections in the document. Let's say this is the equivalent of document.load.
2. To do this there's a field to add the full script which must be typed up correctly beforehand. Only after its added do you know it works and every syntax error requires you to edit your previous script, delete the script you added, and hope your new version works.
3. There's really no interactive way to work with the scripts. Their 'debugger' has almost no features at all or hints of syntax errors. Even getting a script to run in it requires finding the right combination of key strokes in a 1000+ line document on PDF scripting.
The programming itself though is quite simple. It's just Javascript with a different DOM and security model. You can still do event-based programming and write powerful programs - all running inside a PDF. But it will only run in firefox (using PDF.js I think) and Adobe reader (for the JS support.) I just thought I'd tell you that writing these JS programs in PDFs (1) actually seems to have a lot of unrealized potential and (2) the tooling to do so is terrible. So with better JS support it would be useful.
by pixelgeek on 9/30/23, 3:54 PM
by phonon on 10/1/23, 4:41 AM
by qingcharles on 9/30/23, 10:08 PM
Is there a way to view the stream content in ASCII/Unicode instead of Base64/Hex?
by Gys on 9/30/23, 12:43 PM
by anymoonus on 9/30/23, 7:47 AM
1. Is the source available anywhere? I'm curious to see how it works.
2. Is there a way to connect the structure displayed here, to the rendered version in the PDF? To visually display the subcomponents?
by vendiddy on 9/30/23, 2:04 PM
by mkl on 9/30/23, 8:57 AM
I spotted a typo, which led me to a bug. When I click on a "stream contents" node, the right panel says "It's a actual content" (instead of "an"), and there is some mouse handling issue that prevents me from selecting the text in the right panel.
by nraynaud on 9/30/23, 1:10 PM
by davedx on 9/30/23, 10:25 AM