by mci on 3/6/22, 10:15 AM with 32 comments
by LordGrey on 3/6/22, 2:51 PM
LookupCompound and WordSegmentation, algorithms built on Symmetric Delete/Deletion Neighborhoods, are pretty interesting.
[1]https://fastss.csg.uzh.ch/ifi-2007.02.pdf [2]https://arxiv.org/abs/1008.1191v2
by kevincox on 3/6/22, 1:33 PM
I wonder how difficult it would be to adapt this to work on sounds or other frequent typos and misspelling sources instead of just characters. It seems it should he possible if you can define a decent "normalization" function.
by cb321 on 3/6/22, 6:54 PM
That said, the technique is not wholly without merit, but does carry certain "average-worst case" trade offs related to latency in the memory/storage system because of SymSpell's reliance upon large hash tables. For details see https://github.com/c-blake/suggest
EDIT: Also - I am unaware of other implementations even saving the large, slow-to-compute index. The others I am aware of seem to rebuild the large index every time which seems kind of lame. EDIT2 - I guess there is a recent Rust one that is persistent as well as the "mmap & go" Nim one. Still, what should be standard is very rare.
by jamra on 3/6/22, 3:42 PM
How about comparing it to. Levenshtein automaton or another state of the art approach?
by injidup on 3/6/22, 9:28 PM
*EDIT* I just found a solution to this problem. https://support.google.com/assistant/thread/559644?hl=en&msg... You can supply a phonetic name and this helps google match. This seems a bit low tech though.
by danielscrubs on 3/6/22, 12:48 PM
https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/S...
https://github.com/cbeav/symspell/blob/master/src/SymSpell.h...
by nicoburns on 3/6/22, 4:01 PM
Does anyone have any insight into what's holding this back?
by tgv on 3/6/22, 1:20 PM
by wodenokoto on 3/6/22, 1:13 PM
by tootie on 3/6/22, 2:51 PM
by amelius on 3/6/22, 6:36 PM
Or sounds (like Shazam does)?