by nacc on 3/31/17, 2:38 PM with 20 comments
by xorblurb on 3/31/17, 8:48 PM
Also I have no proof that any crazy thing can not happen, but there is no reason for single bit errors not to be corrected regardless of the OS. The worse that should happen for them is to not be reported.
IMO if you have the opportunity (the category of HW you want supports it) you would be crazy not to use ECC RAM. Non-ECC RAM is basically the only component in a PC that is not protected. Obvious weak point. I've been beaten at least twice (two defective components, way more than 2 errors before I figured out what was happening) only on computers I was directly owning or using at work (among a total of a dozen of computers). Now I don't want to loose my time anymore, I always use ECC memory when possible (I'm not going to pay a computer twice the price just for that, so it is a "little" difficult with laptops which also have a plethora of other choice criteria, but it is very easy to get affordable workstation desktop computers with ECC)
No modern digital communication bus will be designed without any form of protection, so this make not much sense to have computers without ECC RAM. I would even like to have it on smartphones, but unfortunately I doubt this will happen soon.
by mjevans on 3/31/17, 8:39 PM
I would far more prefer that the affected program(s) have a chance to react, or be killed as a subset of the system. If the error occurred in a filesystem context there may be other ways of correcting the issue (particularly if it's merely in read cache instead of write cache).
Obviously unhandeled exceptions should cascade until they are either contained or until the entire system halts.
by zkms on 3/31/17, 11:28 PM
I really want to see someone get some radioisotopes and place them next to both ECC and non-ECC RAM (while forcing reads and writes to the affected memory) to see what sort of soft errors / SEUs happen.
by angry_octet on 4/1/17, 3:44 PM