by bilater on 3/8/25, 3:15 AM with 40 comments
by sbarre on 3/8/25, 4:25 AM
Then are you actually doing OCR, or are you just extracting embedded text?
by setnone on 3/8/25, 4:35 AM
by simonw on 3/8/25, 5:42 AM
Honestly, the vibes aren't great. Gemini is a lot more flexible for handling PDFs - you can prompt it to do a bunch of other things - and Mistral OCR appears to hallucinate if it can't correctly read handwriting, a common problem with vision LLM based OCR tools.
The way Mistral OCR handles images within the text is disappointing - it doesn't attempt to interpret them, just extracts them out as binary blobs. A vision LLM can usually do a great job of describing an image, but with Mistral OCR you have to manually run that as a separate step.
by bilater on 3/8/25, 3:15 AM
So cool in fact, I got distracted and ended up building an open source PDF parser and chat app!
Presenting Auntie PDF - your all-knowing guide that unpacks every PDF into clear, actionable insights.
You can upload a pdf or point to a public link, parse it, and then ask questions. All open source and free.
by jbaudanza on 3/8/25, 5:21 AM
by foundzen on 3/8/25, 6:39 AM
by t-3 on 3/8/25, 6:06 AM
by elanning on 3/8/25, 4:00 AM
by JoelJacobson on 3/8/25, 6:06 AM
Would be nice with a [Download Combined Rendered] button to download a self-contained .html web page of the rendered combined page.
by shnpln on 3/8/25, 4:44 AM
by daft_pink on 3/9/25, 7:02 AM
by mjyoon on 3/8/25, 4:35 AM
by yannis on 3/8/25, 4:57 AM
by ab_testing on 3/8/25, 4:56 AM
by triyambakam on 3/8/25, 6:23 AM
by n8m8 on 3/8/25, 4:11 AM
by throwaway81348 on 3/8/25, 4:11 AM
by eastoeast on 3/8/25, 5:10 AM