from Hacker News

Ask HN: What's the best document parsing tool/SDK that you've heard of?

by voiceclonr on 11/4/18, 2:46 PM with 1 comments

I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I'm looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools/SDK they've used or heard to be successful ?
  • by mindcrime on 11/4/18, 6:58 PM

    Tika is what we use. It's not perfect, but it works pretty well for our purposes.