by vikp on 2/13/24, 2:07 PM with 1 comments
by vikp on 2/13/24, 2:08 PM
In my benchmarks, it's more accurate than tesseract in every language except one. (see repo for benchmarking method)
Since it can run on GPU, speed is about equal to tesseract (when cost-matched with a 1x lambda A6000 vs 28 DigitalOcean CPU cores).
It's built using a modified donut architecture - I added an MoE layer, GQA for faster decoding, and UTF-16 decoding (can represent any character, and faster than UTF-8 since you can combine adjacent bytes.)
I theorized that character-level decoding would be an optimal compute allocation, and that a large embedding matrix (relative to UTF-8 decoding) would store language-specific information.
I trained it using 4x A6000s for about 2 weeks.
You can run surya via Python API, from the CLI, or via an interactive app in the repo.