by diminoten on 11/23/13, 7:06 AM
I'm actually looking into a segfault issue deep in the bowels of a C++ addon we have in node.js (anyone in #node.js will have seen me over the past few weeks ask about it), but what reading this makes me realize is how woefully underequipped I am to hunt for problems of this nature.
My problem is likely in one of our addons, but this kind of debugging, this whole genre of problem solving is entirely beyond me. How do I get to this level? What do I need to learn? To study?
It's just a little depressing to read something like this and see how far the road ahead goes, despite how far I've already traveled...
by davidw on 11/23/13, 11:59 AM
I looked at node.js for a system I'm involved with creating, but ultimately we went with Erlang just because it's been around a lot longer and is more stable in terms of things like this. We're working on a semi-embedded system that will not always be on-line or accessible for debugging. We also considered Go, which probably would have been more familiar to C++ guys, but it was also deemed a bit immature even if it seems like a very pleasant language to work with.
Cool writeup though!
by ambirex on 11/23/13, 3:10 AM
Thank you, I really enjoy detailed write-ups like this. It is fascinating to see how an engineer approaches an elusive problem.
by jzwinck on 11/23/13, 8:18 AM
I'd like to read more about how we can prevent this class of error going forward. Could stronger typing or RAII or some other feature or trick have made the bug apparent at compile time?
I made a very basic Node.js module in C++ with V8 and it was surprisingly difficult to make a good (idiomatic JS behaviour, believably bug-free) wrapper for a straightforward class and factory method. I say this coming from Boost Python and Luabind, where there are some tricky parts to bind complex classes, but simple ones are easy enough, and once written, obviously correct.
by city41 on 11/23/13, 5:44 AM
I've been running an extremely simple Node application on 0.10.18 for a while now and it has a very gradual memory leak. My code is just a few dozen lines, and it all seems pretty innocent. I am also using Hapi, so I thought maybe Hapi has a leak in it somewhere. Now I wonder if I have the same leak as Walmart here. I just now upgraded to 0.10.22 and am curious to see where I end up. If the leak goes away then hot damn, I got lucky :)
by ryanseys on 11/23/13, 3:41 AM
And a one-line fix. Damn that must be satisfying.
by charlieflowers on 11/23/13, 5:31 AM
FYI, a typo -- "illusive" -> "elusive". (haven't read further yet, just wanted to let you know).
by aaronbrethorst on 11/23/13, 8:47 AM
Wonderful blog post; major props for the engineering time expenditure. But, why do you have an Olark chat widget that says "Contact Sales". I don't want to have anything to do with those schlubs! If anything, I want to talk to serious engineers like you!
Perhaps a better call to action would be:
* Talk to us about how we can solve your problems
* Chat with us
* We can help you too
* What's up?
by rcthompson on 11/23/13, 4:07 AM
Ironically, this page hangs Chrome indefinitely when I try to load it. Luckily it only hangs the tab so I can still close it. I guess I'll fire up Firefox to see if I can actually read the article.
Edit: Actually, it loads fine in a private browsing tab, so it must be a bad interaction with some extension. Oh well.
by patrickg_zill on 11/23/13, 5:16 AM
That is pretty impressive - I love how they could use DTrace to scope out what was going on.
by retr0h on 11/23/13, 6:32 AM
I've always loved the debugging tools in solaris (smartos or whatever now).
by batbomb on 11/23/13, 3:03 AM
Can anyone tell me if there is reason for this in bash?
DEST=~~/public/walmart.graphs
by atomical on 11/23/13, 6:43 PM
I assume that they can restart the server at intervals or use load balancing. A few months of developer timer for something like this seems excessive unless he was working on something else as well.
by ilaksh on 11/23/13, 5:38 AM
I think there are still quite a few C and C++ programmers out there. To me this is a great example of why it is better software engineering to write a server in something like Node.js. Because rather than having a million code bases with potential memory leaks like this one, there is just the Node code. In ordinary JavaScript code its impossible to cause a problem just that.
by joeblau on 11/23/13, 4:03 PM
Excellent details on the sleuthing that went on to find this error. I think it's great that there are great tools available to debug errors like this and your write up helps me in learning more about how to go about properly debugging my Node apps.
by jnazario on 11/24/13, 6:06 PM
cool writeup. while not a node.js user, i love these sorts of tours of system internals - i always learn a lot, both specific tools and also processes of using them.
thanks for the details, very articulate and useful stuff.
by jokoon on 11/23/13, 2:19 PM
we know that node.js is a bad piece of software, you don't need to remind us about it all the time
(down vote me)