by mntruell on 1/8/25, 9:27 AM with 15 comments
by kcarnold on 1/11/25, 5:45 PM
A related topic is "token healing", although some implementations (unfortunately including the one in HuggingFace Transformers) make some big assumptions that aren't always true (like treating spaces as special).
by viraptor on 1/11/25, 9:37 PM
I feel like I'm missing some issue here... Can't you query stopping at the last full token boundary, then reject any results which don't match the character prefix and continue from there with the completion? Kind of like when you mask the invalid actions when reinforcement training on games? Or is that losing too much info?
by teaearlgraycold on 1/11/25, 6:44 PM
by do_not_redeem on 1/11/25, 7:09 PM
Is there some reason it isn't alphabetical? (More specifically, lexically sorted by codepoint) If you had a model with sorted tokens, you'd be able to solve this by constraining output to tokens with the desired prefix, probably with some mechanism similar to how this works: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...
by amanrs on 1/12/25, 12:32 AM
First "token-healing" doesn't work. Consider the case "app" where the most likely options are "ap|praisal" or "apple|sauce". You can't just sample all tokens that start with app, or you'd miss appraisal.
Second, it's easy to come up with a naive algorithm that samples from the true distribution. It's very difficult to make this algorithm efficient.
by yorwba on 1/11/25, 6:20 PM
by _andrei_ on 1/12/25, 10:33 PM