by feldrim on 2/24/25, 12:19 PM with 62 comments
by dwdz on 2/26/25, 11:18 AM
ls -l win32/
total 0
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\277\237''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\267\213''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\240\220''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\274\273''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\251\205''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\255\223''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\272\257''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\264\207''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\261\246''.exe'
-rw-r--r-- 1 dawid dawid 0 Feb 26 12:13 ''$'\355\254\266''.exe'
...
by account42 on 2/26/25, 12:22 PM
> Windows was an early adopter of Unicode, and its file APIs use UTF‑16 internally since Windows 2000
Wrong. Windows uses WTF-16 [0] despite what the documentation says.
by feldrim on 2/26/25, 10:38 AM
1. On Windows, accessed by WSL
2. On Linux (WSL), using UTF-8 locale
3. On Linux (WSL), using POSIX locale
The difference is weird for me as a user. I'd like to know about the decisions made behind these. If anyone has information, please let me know.
by kzrdude on 2/26/25, 8:12 AM
by mofeien on 2/26/25, 11:37 AM
I was a bit confused by the detour via utf-8 to arrive at the code points and had to look up UTF-8 encoding first to understand how they relate. Then I tried out the following
candidate = chr(0xD800)
candidate2 = bytes([0xED, 0xA0, 0x80]).decode('utf-8', errors='surrogatepass')
print(candidate == candidate2) # True
and it seems that you could just iterate over code points directly with the `chr()` function.by qingcharles on 2/26/25, 8:28 AM
by somewhereoutth on 2/26/25, 10:54 AM
by n_plus_1_acc on 2/26/25, 7:56 AM
by ooterness on 2/26/25, 3:44 PM
It seems obvious that attempts to create files with such filenames ought to be blocked.
by ge96 on 2/26/25, 4:02 PM
by rob74 on 2/26/25, 7:42 AM
by Devasta on 2/26/25, 9:33 AM
The real solution is to force the entire world population to use the Rotokas language of Papua New Guinea.
by theiebrjfb on 2/26/25, 8:54 AM
Ext4 filename has maximal length 255 characters. That is the only legacy limit you have to deal with as a Linux user. And even that can be avoided by using more modern filesystems.
And we get filesystem level snapshots etc...