One thing I've learned about lock-free structures: DWCAS[1] is your friend. Using DWCAS, the Michael and Scott Queue algorithm is trivial to implement. Not really sure why the author insists on using plain CAS, especially when implementing a DWCAS-based algorithm.
[1] http://en.wikipedia.org/wiki/Double_compare-and-swap