by qsantos on 8/25/24, 4:52 PM with 166 comments
by koverstreet on 8/25/24, 11:08 PM
The idea is a syscall for getting a ringbuffer for any supported file descriptor, including pipes - and for pipes, if both ends support using the ringbuffer they'll map the same ringbuffer: zero copy IO, potentially without calling into the kernel at all.
Would love to find collaborators for this one :)
by fatcunt on 8/26/24, 11:31 AM
This is caused by the CONFIG_RETHUNK option. In the disassembly from objdump you are seeing the result of RET being replaced with JMP __x86_return_thunk.
https://github.com/torvalds/linux/blob/v6.1/arch/x86/include...
https://github.com/torvalds/linux/blob/v6.1/arch/x86/lib/ret...
> The NOP instructions at the beginning and at the end of the function allow ftrace to insert tracing instructions when needed.
These are from the ASM_CLAC and ASM_STAC macros, which make space for the CLAC and STAC instructions (both of them three bytes in length, same as the number of NOPs) to be filled in at runtime if X86_FEATURE_SMAP is detected.
https://github.com/torvalds/linux/blob/v6.1/arch/x86/include...
https://github.com/torvalds/linux/blob/v6.1/arch/x86/include...
https://github.com/torvalds/linux/blob/v6.1/arch/x86/kernel/...
by 0xbadcafebee on 8/25/24, 11:13 PM
by JoshTriplett on 8/25/24, 11:31 PM
by donaldihunter on 8/26/24, 2:20 PM
[1] https://www.intel.com/content/dam/www/central-libraries/us/e...
[2] https://www.intel.com/content/www/us/en/developer/articles/t...
by nitwit005 on 8/25/24, 10:06 PM
by qsantos on 8/26/24, 6:55 AM
by RevEng on 8/25/24, 11:01 PM
by rwmj on 8/26/24, 12:30 PM
by stabbles on 8/26/24, 9:26 AM
by Borg3 on 8/26/24, 8:15 AM
Anyway, nice article, its good to know whats going on under the hood.
by faizshah on 8/26/24, 3:59 PM
In my experience in data engineering, it’s very unlikely you can exceed 500mb/s throughput of your business logic as most libraries you’re using are not optimized to that degree (SIMD etc.). That being said I think it’s a good technique to try out.
I’m trying to think of other applications this could be useful for. Maybe video workflows?
by sixthDot on 8/26/24, 10:22 AM
The jump seems generated by the expansion of the `ASM_CLAC` macro, which is supposed to change the EFLAGS register ([1], [2]). However in this case the expansion looks like it does nothing (maybe because of the target ?). I 'd be interested to know more about that. Call to the wild.
[1]: https://github.com/torvalds/linux/blob/master/arch/x86/inclu...
by yencabulator on 8/26/24, 6:43 PM
by nyanpasu64 on 8/26/24, 4:40 AM
by jvanderbot on 8/26/24, 3:57 PM
I think you need to recompile your compiler, or disable those explicitly via link / cc flags. Compilers are fairly hard to get to coax / dissuade SIMD instructions, IMHO.
by arendtio on 8/26/24, 12:15 PM
by up2isomorphism on 8/26/24, 3:21 PM
by jeremyscanvic on 8/26/24, 5:11 PM
by goodpoint on 8/26/24, 10:14 AM
by mparnisari on 8/26/24, 3:32 PM
by cowsaymoo on 8/26/24, 6:23 AM
by djaouen on 8/25/24, 10:29 PM
by jheriko on 8/25/24, 9:18 PM
the only time ive used them is external constraints. they are just not useful.