by _Microft on 3/14/25, 8:05 PM with 86 comments
by petertodd on 3/17/25, 6:56 PM
$ hexdump -C foo.ots
00000000 00 4f 70 65 6e 54 69 6d 65 73 74 61 6d 70 73 00 |.OpenTimestamps.|
00000010 00 50 72 6f 6f 66 00 bf 89 e2 e8 84 e8 92 94 01 |.Proof..........|
0) Magic is at the beginning of the file.1) Starts with a null-byte to make it clear this is binary, not text.
2) Includes a human-readable part to make it easy to figure out what the file is in hex dumps.
3) 8 bytes of randomly chosen bytes, all of which greater than 0x7F to ensure they're not ASCII.
3) Finally, a one-byte major version number.
4) Total length (including major version) is 32 bytes to fit nicely in a hex dump.
by conaclos on 3/17/25, 10:21 AM
SHOULD include a zero byte
I guess it is expected to be at the end of the magic number to act as a null-termibated string? MUST include a byte sequence that is invalid UTF-8
I guess it is to differentiate a text file from a specific format? MUST include at least one byte with the high bit set
Any reason?by gardaani on 3/17/25, 5:05 PM
by shagie on 3/17/25, 6:10 PM
The file itself has a format that can test a file and identify it (and possibly more useful information) that is read by the file command.
# Various dictionary images used by OpenFirware FORTH environment
0 lelong 0xe1a00000
>8 lelong 0xe1a00000
# skip raspberry pi kernel image kernel7.img by checking for positive text length
>>24 lelong >0 ARM OpenFirmware FORTH Dictionary,
>>>24 lelong x Text length: %d bytes,
>>>28 lelong x Data length: %d bytes,
>>>32 lelong x Text Relocation Table length: %d bytes,
>>>36 lelong x Data Relocation Table length: %d bytes,
>>>40 lelong x Entry Point: %#08X,
>>>44 lelong x BSS length: %d bytes
by ajross on 3/17/25, 5:39 PM
This isn't a solvable problem. File formats evolve in messy ways, they always have and always will, and "magic numbers" just aren't an important enough part of the solution to be worth freaking out about.
Just make it unique; read some bytes out of /dev/random, whatever. Arguments like the one here about making them a safe nul-terminated string that is guaranteed to be utf-8 invalid are not going to help anyone in the long term.
by badmintonbaseba on 3/17/25, 4:38 PM
by weinzierl on 3/17/25, 9:24 AM
7F 45 4C 46
- MUST be the very first N bytes in the file -> check- MUST be at least four bytes long, eight is better -> check, but only four
- MUST include at least one byte with the high bit set -> nope
- MUST include a byte sequence that is invalid UTF-8 -> nope
- SHOULD include a zero byte -> nope
So, just 1.5 out of 5. Not good.
By the way, does anyone know the reason it starts with DEL (7F) specifically?
by xg15 on 3/18/25, 7:32 AM
> MUST include a byte sequence that is invalid UTF-8
Making the magic number UTF-8 (or ASCII, which would still break the rule) would effectively turn it into a "magic string". Isn't that the better method for distinguishability? It's easier to pick unique memorable strings than unique memorable numbers, and you can also read it in a hex editor.
What would be the downsides?
Or is the idea of the requirement to distinguish the format from plaintext files? I'd think that the version number or the rest of the format already likely contained some invalid UTF-8 to ensure that.
by secondcoming on 3/17/25, 9:36 PM
by kazinator on 3/18/25, 4:24 PM
#!/usr/bin/whatever^@^@^@^@^@[HDR]
A hash bang path terminated by a null, followed by some (aligned) binary material with version information and whatnot, all fitting into around 32 bytes.The header format could allow for variability in the path; the #! and [HDR] part could be enough to give it identify it.
by hgomersall on 3/17/25, 10:22 AM
by ks2048 on 3/17/25, 11:30 PM
by eternityforest on 3/17/25, 8:35 PM
Maybe a zero, the UUID as ASCII, then another zero, then a human readable description for debugging and search, or a structured metadata header.
But first, ask yourself why you are designing a binary format, unless maybe it's a new media container.
When would someone ever want a binary file that's not zip, SQLite, or version controllable text?
by baggy_trough on 3/17/25, 1:54 PM
by _ce5e on 3/17/25, 7:25 PM
Sometimes I like to have fun and encode a 1337-code easter egg in the hexadecimal representation