by zepearl on 4/30/24, 9:53 PM with 4 comments
1) A user sees some character on his/her screen => that's a "grapheme", which is a collection of...
2) ...1 to N "Unicode code points", where a single "Unicode code point" can use...
3) ...1 to 6 "UTF-8" bytes.
Is that right (in the case of UTF-8 storage)?
(I feel like that I'm missing an intermediate step...)
(indirectly related to "You can't just assume UTF-8" https://news.ycombinator.com/item?id=40195009 , comment https://news.ycombinator.com/item?id=40206149 , link mentioned in that comment being https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ )
Thx :o)
by nuc1e0n on 5/1/24, 9:04 PM
by nuc1e0n on 5/3/24, 3:41 PM