from Hacker News

Building Sentry, a service to process native crash reports and minidumps

by daniel_levine on 6/14/19, 12:47 AM with 32 comments

  • by etaioinshrdlu on 6/14/19, 9:11 AM

    Sentry is one of the nicest services I've head the pleasure of using. Having our errors centrally logged and managed is invaluable.

    Source: a happy user.

  • by robocat on 6/14/19, 9:29 AM

    We have used Sentry for a long time with JavaScript. The main issues for us are:

    * Obese JavaScript code. We had to write our own custom code to log events.

    * Aimed at large scale companies. We only have 1000s of users, and we care about each individual exception, but I think it is really aimed at consolidating large numbers of events.

    * Meaningless percentages on data. Tagged data is processed, but the end percentage value has little meaning e.g. send through 1000 similar events, with 1 event with a tag with value X, and 1 event with a tag with value Y, and 998 with no value. Sentry reports 50% X and 50% Y!

    But they have given us really excellent service, especially given we are not paying enterprise rates.

    Edit: also we are not in a US timezone, which makes the UI weird. And I do love the email integration: have a bug, get an email, fix it.

  • by xvilka on 6/14/19, 6:10 AM

    They might want to check radare2[1] for processing crash dumps, since it supports all 3 major platforms (Windows, Linux, OS X), and allows to play with the stripped files as well.

    [1] https://github.com/radare/radare2

  • by rtpg on 6/14/19, 3:09 PM

    Sentry has been very good to us, and it’s a generally great business model to boot! Overall great for the community and for ourselves

    I am going to whine a bit that the recent move over to the unified SDK has been less than ideal for us. The fact that the raven docs would point us to the unified SDK but not to a “how to migrate” page made me super unsure about whether we were doing the right thing (esp. when it came to the logging integrations on Python)

    It’s kind of an interesting problem, providing SDKs for each language. Sentry went with unifying the API across language boundaries and I’m not super happy with the results but I don’t have like 30 packages to maintain

  • by Operyl on 6/14/19, 8:20 AM

    The title cuts off “Symbolicator,” the specific name of the component here which is slightly confusing.
  • by sciurus on 6/14/19, 2:31 PM

    This is cool stuff! It's nice to see what Sentry can develop in this space with the focus and resources that they have.

    I handle ops for Mozilla's crash reporting pipeline for Firefox [0] and our symbol server [1], among other things. I know our respective development teams stay in touch, and I hope we can find a way to use symbolic/symbolicator to simplify our stack.

    [0] https://socorro.readthedocs.io/en/latest/ [1] https://tecken.readthedocs.io/en/latest/

  • by SEJeff on 6/17/19, 4:44 PM

    I've used sentry since Dave Cramer (sentry original author) was working back at Disqus years ago. It's excellent software that fills a really important niche. It is wonderful to see he managed to build a solid team and company around it.
  • by larrik on 6/14/19, 3:42 PM

    I really like sentry, but I'm sad that the URL scheme changed (from sentry.io/<org name>/<project name>/ to sentry.io/organizations/<org>/issues/?project=<meaningless int>)
  • by scardine on 6/14/19, 11:51 AM

    Hey @the_mitsuhiko, any plans to support Django Channels (daphne) out of the box? Debugging async stuff is tough.
  • by js2 on 6/14/19, 1:44 PM

    I built Yahoo's in-house mobile app crash reporting tool a few years ago (still in use). I used an on-premise install of Sentry as the UI. At the time, Sentry didn't really support mobile error reporting, so I built something much like what's detailed in this post and called it the Processor.

    I regret never having made the time to open-source what I built. The Processor is written in Python, takes reports from mobile devices, unwinds, symbolicates, retraces, unminifies, etc as needed, then generates a Sentry "event" and forwards that to our on-prem Sentry instance.

    I also built the SDKs. For iOS, I used PLCrashReporter. These days I'd probably use KSCrash. An important point here. On iOS, the unwinding is done on the device. So all you have to do on the backend is symbolicate it. Another point: it's relatively easy to get iOS system symbols. Plug an iOS device into a Mac running Xcode and the symbols are transferred from the device to the Mac. You can then harvest them however you need. In fact, Apple has apparently stopped encrypting OTA updates so you no longer need an iOS device to get the symbols:

    https://github.com/Zuikyo/iOS-System-Symbols

    For Android NDK crashes I've tried a few approaches and still don't have a satisfying solution. Originally I went with breakpad + minidumps on the device. On the backend, the Processor runs the breakpad stackwalker on the minidump. Another important point: the unwinding is occurring on the backend in this case, unlike iOS where it's done on the phone. (A minidump is basically just a snapshot of all the thread stack memory, plus some extra diagnostic info.) But to unwind reliably off-device you need the Android system symbols (in addition to the app's symbols obviously). Well good luck with that. Google makes the original Nexus Android OS images available so you can harvest those but you'll never get symbols for all the various Android devices. I built a tool that can harvest symbols off a device and tried to crowdsource them from Yahoo's developers but it's not been very successful (there's a lot of flavors of Android).

    Another issue is that minidumps are relatively largish to deal with. So my second approach was two-fold. I'm still using breakpad's crash handler on the device, but I now have it generating the much smaller microdump format. In addition, I've added libunwind to our Android SDK so that after capturing the microdump, I attempt to unwind on the device (also collecting function names during unwinding) and add that info to the report. The Processor then only needs to unwind the microdump if the unwinding on the device failed. Otherwise it just needs to symbolicate. This hasn't been wildly successful though. Unwinding on an Android device is trickier than on an iOS device. Also, it's almost impossible (well I haven't figured out how) to unwind through the ART/Java frames that called into the native code.

    Of course the vast majority of Android crashes are in Java code and this is much easier to deal with these. They are unwound just find on the device so on the backend you only need to deal with deobfuscating the ProGuard minification which is easily done using the mapping file generated by ProGuard.

    What's really annoying with native mobile crashes is that both Android and iOS have their own services for both capturing crashes and unwinding on the device. And because these are integrated with the OS and work out-of-process, they are much more reliable than anything you can do in-process using something like PLCR, KSCrash, libunwind, etc.

    But, neither OS gives an app access to its own system generated reports. All you get is the lame reports the devices upload to Google Play Console / iTunes Connect.

    Anyway, thank you to Sentry for providing such a great product and I'm sorry again I wasn't able to contribute more. I'm not sure what I built would work at your scale. It's interesting we ended up with similar designs.