by spapas82 on 8/11/24, 10:14 AM with 6 comments
You can index all kind of files (i.e doc, docx, xls, ppt, pdf, txt, html even ORC pdfs) and then search them using very advanced queries like "always contain X", "never contain X", "X near Y", wildcard search, proper stemming support etc.
We're using it on my work where we have hundreds of thousands of doc/docx/pdf files and it works flawlessly!
by unstatusthequo on 8/12/24, 7:00 PM
by namanyayg on 8/12/24, 12:41 PM
one thing that caught my eye was the mention of 'proper stemming support' - can you elaborate on how you're handling stemming? are you using a specific library or rolling your own implementation? also, have you considered adding any sort of faceting/search filtering to the results?
by compressedgas on 8/15/24, 1:25 PM