by dbuxton on 11/24/21, 8:32 AM with 10 comments
by thaumasiotes on 11/26/21, 5:36 PM
It's been OCRed, and the Greek has been mangled beyond belief. Sometimes the OCR will split a single character.
No real point to the story, but it feels relevant here. I see Rescribe has already encountered the problem: "In the second step we run the OCR on the preprocessed files, using our specifically trained packages and adapting language and character settings to the document at hand."
(I'm only complaining to a very small degree. Having a low-quality OCRed ebook available is much better than having no ebook available. And what is normally displayed is the image of the text, not the OCRed nonsense, so it doesn't matter that the Greek has been transformed into gibberish until you encounter the odd mid-character word break.)
by raybb on 11/26/21, 4:15 PM
by IshKebab on 11/26/21, 5:00 PM