by bilalhusain on 7/10/21, 3:55 PM with 57 comments
by aeyes on 7/10/21, 6:48 PM
CloudSQL Postgres is running with a misconfigured OS OOM killer, crashes Postmaster randomly even if memory use is below instance spec. GCP closes this bug report as "Won't fix".
This is a priority 1 issue. Seeing a wontfix for this has completely destroyed my trust of their judgement. The bug report states that they have been in contact with support since February.
Unbelievable attitude towards fixing production critical problems of their platform affecting all customers.
by Winsaucerer on 7/11/21, 12:52 AM
by thyrsus on 7/10/21, 7:08 PM
* I had some compute servers that were up for 200 days. The customers noticed that they were half as fast as identical hardware just booted. Dropping the file system cache ("echo 3 | sudo dd of=/proc/sys/vm/drop_cache") brought the speed back up to the newly deployed servers. WTF? File system caches are supposed to be zero cost discards as soon as processes ask for RAM - but something else is going on. I suspect the kernel is behaving badly with overpopulated RAM management data (TLB entries?), but I don't know how to measure that.
* If that is actually the problem, then a solution might be to decrease data size by using non-zero hugepages ("cat /proc/sys/vm/nr_hugepages"). I'd love to see recommendations on when to use that.
by mnahkies on 7/10/21, 8:24 PM
It surprised me because I had never executed a query and caused the whole host to crash up until that point - now I'm wondering if this misconfiguration is the cause
by renewiltord on 7/11/21, 4:11 AM
by zingar on 7/11/21, 6:45 AM
by shdh on 7/10/21, 9:53 PM
by yjftsjthsd-h on 7/10/21, 5:35 PM
by dkersten on 7/10/21, 7:25 PM