from Hacker News

Auntie PDF – an open source app built using Mistral OCR

by bilater on 3/8/25, 3:15 AM with 40 comments

  • by sbarre on 3/8/25, 4:25 AM

    I find it challenging to accept something that talks about "OCR" but then I upload a PDF with text in images, and when I query the document after upload, I get a message that says "I can't interpret images"..

    Then are you actually doing OCR, or are you just extracting embedded text?

  • by setnone on 3/8/25, 4:35 AM

    Sweet branding! Grandma told me she's not happy with lack of privacy policy.
  • by simonw on 3/8/25, 5:42 AM

    I built a CLI tool for experimenting with Mistral OCR here: https://simonwillison.net/2025/Mar/7/mistral-ocr/

    Honestly, the vibes aren't great. Gemini is a lot more flexible for handling PDFs - you can prompt it to do a bunch of other things - and Mistral OCR appears to hallucinate if it can't correctly read handwriting, a common problem with vision LLM based OCR tools.

    The way Mistral OCR handles images within the text is disappointing - it doesn't attempt to interpret them, just extracts them out as binary blobs. A vision LLM can usually do a great job of describing an image, but with Mistral OCR you have to manually run that as a separate step.

  • by bilater on 3/8/25, 3:15 AM

    OK I've been critical of Mistral AI but credit where credit is due. Mistral OCR seems cool.

    So cool in fact, I got distracted and ended up building an open source PDF parser and chat app!

    Presenting Auntie PDF - your all-knowing guide that unpacks every PDF into clear, actionable insights.

    You can upload a pdf or point to a public link, parse it, and then ask questions. All open source and free.

  • by jbaudanza on 3/8/25, 5:21 AM

    I have a question about Mistral OCR. If I give the model a PDF that is 90% text, is it actually performing OCR on an image representation of the text? Or is it smart enough to extract the text directly and only use OCR on images?
  • by foundzen on 3/8/25, 6:39 AM

    Love the creativity in the branding but it did not work in my case either. Gibberish raw content and error in answering any question.
  • by t-3 on 3/8/25, 6:06 AM

    What are people using these OCR programs for? Are there really that many PDFs being made without embedded text these days?
  • by elanning on 3/8/25, 4:00 AM

    It looks great, nice work. I’m impressed at the quick development too.
  • by JoelJacobson on 3/8/25, 6:06 AM

    Thanks for creating, really useful!

    Would be nice with a [Download Combined Rendered] button to download a self-contained .html web page of the rendered combined page.

  • by shnpln on 3/8/25, 4:44 AM

    I would like it if my chat session did not clear if go to Document Content and back to chat. Or I wish I could see my document when chatting.
  • by daft_pink on 3/9/25, 7:02 AM

    Is there a way to use mistral ocr on us servers so your data never leaves our borders?
  • by mjyoon on 3/8/25, 4:35 AM

    Unfortunate that Mistral OCR can't tell me details presented in charts and graphs.
  • by yannis on 3/8/25, 4:57 AM

    Pretty impressive and did a good job for an academic pdf I uploaded. Nice UI also.
  • by ab_testing on 3/8/25, 4:56 AM

    This is amazing. Could you share the prompts that were used for this product ?
  • by triyambakam on 3/8/25, 6:23 AM

    The coolest thing about this is the short and easy to pronounce .com
  • by n8m8 on 3/8/25, 4:11 AM

    im on mobile and don’t have a pdf to test it with, but I love your styling and text copy.
  • by throwaway81348 on 3/8/25, 4:11 AM

    what about privacy?
  • by eastoeast on 3/8/25, 5:10 AM

    Awesome UI!