from Hacker News

Reverse Engineering Protobuf Definitions from Compiled Binaries

by arkadiyt on 3/9/24, 8:21 PM with 12 comments

  • by dunham on 3/10/24, 4:40 AM

    I've used protodump before, but have found that some applications don't embed the .proto data.

    It may be bit-rotted by now, but a couple of years ago I hacked together a python script that would extract a proto definition from Objective C apps by scanning the assembly output and looking for the patterns of code generated by the protoc compiler. I put it on github in case it is useful to somebody.

    https://gist.github.com/dunhamsteve/224e26a7f56689c33cea4f0f...

    Compiled ObjectiveC is a bit simpler than compiled C/C++, you can read the method invocations out of it. I haven't looked into how hard the output of Swift is to read.

    I've also analyzed protobuf data (Apple Notes) by writing code that decodes the data in a generic fashion and output a guess at the schema. I would run on about 100 samples, to help distinguish binary data from sub-objects, to detect optional fields, and to detect 'repeated' fields. Then you have to go through and figure out what all of the fields are.

    I succeeded, but later learned that the notes web app embedded the plain text .proto file, which would have made things a lot easier.

  • by sandermvanvliet on 3/10/24, 7:48 AM

    Definitely a useful tool. Decoding protobuf (and message formats in general) can be such a pain and fun at the same time.

    I’ve written ProtobufDecoder which takes a different approach: analyze the structure of the actual messages to help you figure out the protobuf structure of a message.

    https://github.com/sandermvanvliet/ProtobufDecoder

  • by jobmeplease on 3/10/24, 4:53 AM

    Back at Google there was a really nice extension to protobufs where servers had a side channel that let you query all the services on the end point along with their full proto descriptors. That's probably internal only though (but I haven't played with grpc enough to know).
  • by choppaface on 3/10/24, 6:42 AM

    For at least 4 years protobuf has had decent support for self-describing messages (very similar to avro) as well as reflection

    https://github.com/protocolbuffers/protobuf/blob/main/src/go...

    Xgooglers trying to make do on the cheap will just create a Union of all their messages and include the message def in a self-describing message pattern. Super-sensitive network I/O can elide the message def (empty buffer) and any for RecordIO clone well file compression takes care of the definition.

    Definitely useful to be able to dig out old defs but protobuf maintainers have surprisingly added useful features so you don’t have to.

    Bonus points tho for extracting the protobuf defs that e.g. Apple bakes into their binaries.

  • by mkl on 3/10/24, 8:57 AM

    I have used this other tool to good effect to reverse engineer a file format based on Protobuf: https://github.com/mildsunrise/protobuf-inspector
  • by davedx on 3/10/24, 8:42 AM

    Useful! Wish I had this when I started reverse engineering pbf map tiles a few months back.