by diskmuncher on 7/31/22, 3:40 PM with 92 comments
by neilv on 7/31/22, 7:05 PM
(But maybe wasn't as much on people's radars, with all lower-hanging fruit of other technology choices and practices going on, outside of PDF.)
New code for a large spec is also interesting for potential vulns, but maybe easier to get confidence about.
One neat direction they could go is to be considered more trustworthy than the Adobe products. For example, if one is thinking of a PDF engine as (among other purposes) supporting the use case of a PDF viewer that's an agent of the interests of that individual human user, then I suspect you're going to end up with different attention and decisions affecting security (compared to implementations from businesses focused on other goals).
(I say agent of the individual user, but that can also be aligned with enterprise security, as an alternative to risk management approaches that, e.g., ultimately will decide they're relying on gorillas not to make it through the winter.)
by kisamoto on 7/31/22, 7:52 PM
But anyway - I understand why they have changed their interpreter however the lack of major version bump threw me off. I use ps2pdf to optimize pdfs (long story short - makes their size smaller) and was alarmed when my pdfs suddenly ended up without the jpeg backgrounds. Instead, purely black (although this did result in a very small file size so who knows... :) )
Thankfully you can add `-d NEWPDF=false` to your command to use the old parser. I'm yet to submit a bug report but it would be nice if it was backwards compatible...
by hnick on 8/1/22, 12:58 AM
Anyone who has done PDF composition for a "print ready" job (what a lie) from a client has run into this so many times. All we have to do is rearrange the pages in the right sorted order, add some barcodes, and print, right? Acrobat can open the file, so why is your printer crashing? Ironically, some of those printers used an Adobe RIP in the toolchain and this conversion PDF->PS on the printer was where things went wrong (I once tracked down a crash where a font's gylph name definition in the dict was OK in PDF but invalid syntax in PS, due to a // resolving into an immediately evaluated name that doesn't exist) but it's not something a technician could help with.
It was so bad that Ghostscript was one of many tools - we'd throw a PDF through various toolchains to hope one of them saved it in a format that was well behaved. Anyway I'm almost sad I've moved on from that job now so I can't try it out with some real world files. But in the end most of the issues came down to fonts and people using workflows that involve generating single document PDFs and merging them, resulting in things like 1000 subset fonts which are nearly identical and consuming all the printer memory, so I'm not sure how well this would help.
by toddm on 7/31/22, 5:33 PM
Kudos and thank you to those who maintain it and the associated packages!
by mkl on 7/31/22, 8:18 PM
Does anyone know of a collection of malformed PDF files? It would be useful for testing PDF processing programs.
by lordfosco on 7/31/22, 4:56 PM
While progress is always nice to see - I am also pleased that we don't necessarily need to update all the scripts that depend on ghostscript at once but can keep them running in their current state.
by vivegi on 7/31/22, 6:25 PM
Even if the application was fine, you would always encounter PS/PDF files in the wild that kept stress-testing the application's memory safety.
by mepian on 7/31/22, 7:53 PM
Isn't C, their chosen replacement of PostScript, also particularly bad at this?
by aidos on 7/31/22, 4:54 PM
They seem to be the kings of working with PDFs. I’ve not really looked at the Ghostscript code (and I’m surprised to hear their interpreter was still in postscript), but I’ve looked through the mupdf code and what I saw was really nice.
In any case, I appreciate the work they’ve done in providing fantastic tools to the world for decades now.
by 3ace on 8/1/22, 6:40 AM
I'm grinning widely when reading this.
Until last year I had an opportunity to help maintaining a pdf tools written using Golang. This case where a pdf doc that is not conforming with the standard could be opened in Acrobat but not on other pdf reader tools (including ghostscript) came a lot from our clients and I had to find a way to be able to read/extract the content with a minimum issue because of that.
by rcarmo on 8/1/22, 7:15 AM
PDF became such a weird mess that I’m not surprised Postscript is now just a subset of it (to a degree), but writing an entirely new interpreter has had to be a hefty chunk of work..
by vintagedave on 7/31/22, 4:50 PM
The post has no explananation of this choice. Does anyone know?
by vfclists on 7/31/22, 6:15 PM
Not good!!
by forgotpwd16 on 7/31/22, 4:42 PM
by diskmuncher on 7/31/22, 3:40 PM