by jcapote on 3/7/17, 6:24 PM with 96 comments
by brendangregg on 3/7/17, 7:40 PM
Summarized details here:
https://www.slideshare.net/brendangregg/performance-tuning-e...
by drewg123 on 3/7/17, 7:22 PM
Roughly 10 years ago, when I was the driver author for one of the first full-speed 10GbE NICs, we'd get complaints from customers that were sure our NIC could not do 10Gbs, as iperf showed it was limited to 3Gb/s or less. I would ask them to re-try with netperf, and they'd see full bandwidth. I eventually figured out that the complaints were coming from customers running distros without the vdso stuff, and/or running other OSes which (at the time) didn't support that (Mac OS, FreeBSD). It turns out that the difference was that iperf would call gettimeofday() around every socket write to measure bandwidth. But netperf would just issue gettimeofday calls at the start and the end of the benchmark, so iperf was effectively gettimeofday bound. Ugh.
by nneonneo on 3/7/17, 8:54 PM
This is a big speed hit. Some programs can use gettimeofday extremely frequently - for example, many programs call timing functions when logging, performing sleeps, or even constantly during computations (e.g. to implement a poor-man's computation timeout).
The article suggests changing the time source to tsc as a workaround, but also warns that it could cause unwanted backwards time warps - making it dangerous to use in production. I'd be curious to hear from those who are using it in production how they avoided the "time warp" issue.
by binarycrusader on 3/7/17, 7:53 PM
1) first, by eliminating the need for a context switch for libc calls such as gettimeofday(), gethrtime(), etc. (there is no public/supported interface on Solaris for syscalls, so libc would be used)
2) by providing additional, specific interfaces with certain guarantees:
https://docs.oracle.com/cd/E53394_01/html/E54766/get-sec-fro...
This was accomplished by creating a shared page in which the time is updated in the kernel in a page that is created during system startup. At process exec time that page is mapped into every process address space.
Solaris' libc was of course updated to simply read directly from this memory page. Of course, this is more practical on Solaris because libc and the kernel are tightly integrated, and because system calls are not public interfaces, but this seems greatly preferable to the VDSO mechanism.
by jdamato on 3/7/17, 7:23 PM
[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...
by JoshTriplett on 3/7/17, 8:28 PM
- You have a stable hardware TSC (you can check this in /proc/cpuinfo on the host, but all reasonably recent hardware should support this).
- The host has the host-side bits of the KVM pvclock enabled.
As long as you meet those two conditions, KVM should support fast vDSO-based time calls.
by masklinn on 3/7/17, 7:20 PM
by andygrunwald on 3/7/17, 8:17 PM
by chillydawg on 3/7/17, 8:42 PM
I expect there are many such patches that you could use to narrow down the version range of the host kernel. Once you've that information, you may be in a better position to exploit it, knowing which bugs are and are not patched.
by nodesocket on 3/8/17, 7:51 AM
blog ~ touch test.c
blog ~ nano test.c
blog ~ gcc -o test test.c
blog ~ strace -ce gettimeofday ./test
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 100 gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000
by gtirloni on 3/7/17, 7:17 PM
by amluto on 3/8/17, 4:16 AM
There are patches floating around to support vDSO timing on Xen.
But isn't AWS moving away from Xen or are they just moving away from Xen PV?
by apetresc on 3/7/17, 8:20 PM
by anonymous_iam on 3/7/17, 8:31 PM
by xenophonf on 3/7/17, 8:08 PM
I ran the test program on a Hyper-V VM running CentOS 7 and got the same result: 100 calls to the gettimeofday syscall. Conversely, I tested a vSphere guest (also running CentOS 7), which didn't call gettimeofday at all.
by MayeulC on 3/8/17, 8:02 AM
https://news.ycombinator.com/item?id=13697555
It seems very closely related, unless I am mistaken.
by pgaddict on 3/7/17, 8:49 PM
by teddyuk on 3/7/17, 8:37 PM
I've worked on quite a few systems and can't think of a time where an api for getting the time would have been called so much that it would affect performance?
by westbywest on 3/7/17, 9:02 PM
by peterwwillis on 3/8/17, 8:22 AM
Or, instead, you could just not do that. Then you could go back to being productive, instead of wasting time tracking down unstable small tweaks for edge cases that you can barely notice after looping the same syscall 5 million times in a row.
When will people learn not to micro-optimize?
by known on 3/8/17, 3:34 AM
by damagednoob on 3/8/17, 9:02 PM