from Hacker News

Auntie PDF – an open source app built using Mistral OCR

by bilater on 3/8/25, 3:15 AM with 40 comments

by sbarre on 3/8/25, 4:25 AM
I find it challenging to accept something that talks about "OCR" but then I upload a PDF with text in images, and when I query the document after upload, I get a message that says "I can't interpret images"..
Then are you actually doing OCR, or are you just extracting embedded text?
by setnone on 3/8/25, 4:35 AM
Sweet branding! Grandma told me she's not happy with lack of privacy policy.
by simonw on 3/8/25, 5:42 AM
I built a CLI tool for experimenting with Mistral OCR here: https://simonwillison.net/2025/Mar/7/mistral-ocr/
Honestly, the vibes aren't great. Gemini is a lot more flexible for handling PDFs - you can prompt it to do a bunch of other things - and Mistral OCR appears to hallucinate if it can't correctly read handwriting, a common problem with vision LLM based OCR tools.
The way Mistral OCR handles images within the text is disappointing - it doesn't attempt to interpret them, just extracts them out as binary blobs. A vision LLM can usually do a great job of describing an image, but with Mistral OCR you have to manually run that as a separate step.
by bilater on 3/8/25, 3:15 AM
OK I've been critical of Mistral AI but credit where credit is due. Mistral OCR seems cool.
So cool in fact, I got distracted and ended up building an open source PDF parser and chat app!
Presenting Auntie PDF - your all-knowing guide that unpacks every PDF into clear, actionable insights.
You can upload a pdf or point to a public link, parse it, and then ask questions. All open source and free.
by jbaudanza on 3/8/25, 5:21 AM
I have a question about Mistral OCR. If I give the model a PDF that is 90% text, is it actually performing OCR on an image representation of the text? Or is it smart enough to extract the text directly and only use OCR on images?
by foundzen on 3/8/25, 6:39 AM
Love the creativity in the branding but it did not work in my case either. Gibberish raw content and error in answering any question.
by t-3 on 3/8/25, 6:06 AM
What are people using these OCR programs for? Are there really that many PDFs being made without embedded text these days?
by elanning on 3/8/25, 4:00 AM
It looks great, nice work. I’m impressed at the quick development too.
by JoelJacobson on 3/8/25, 6:06 AM
Thanks for creating, really useful!
Would be nice with a [Download Combined Rendered] button to download a self-contained .html web page of the rendered combined page.
by shnpln on 3/8/25, 4:44 AM
I would like it if my chat session did not clear if go to Document Content and back to chat. Or I wish I could see my document when chatting.
by daft_pink on 3/9/25, 7:02 AM
Is there a way to use mistral ocr on us servers so your data never leaves our borders?
by mjyoon on 3/8/25, 4:35 AM
Unfortunate that Mistral OCR can't tell me details presented in charts and graphs.
by yannis on 3/8/25, 4:57 AM
Pretty impressive and did a good job for an academic pdf I uploaded. Nice UI also.
by ab_testing on 3/8/25, 4:56 AM
This is amazing. Could you share the prompts that were used for this product ?
by triyambakam on 3/8/25, 6:23 AM
The coolest thing about this is the short and easy to pronounce .com
by n8m8 on 3/8/25, 4:11 AM
im on mobile and don’t have a pdf to test it with, but I love your styling and text copy.
by throwaway81348 on 3/8/25, 4:11 AM
what about privacy?
by eastoeast on 3/8/25, 5:10 AM
Awesome UI!