from Hacker News

Reverse Engineering TikTok's VM Obfuscation

by hazebooth on 12/23/22, 7:36 PM with 122 comments

  • by noduerme on 12/24/22, 11:39 AM

    This is really awesome work.

    I spent a lot of time in the early 2000s coming up with nasty obfuscation techniques to protect certain IP that inherently needed to be run client-side in casino games. Up to and including inserting bytecode that was custom crafted to intentionally crash off-the-shelf decompilers that had to run the code to disassemble it (and forcing them to phone home in the process where possible!)

    My view on obfuscation is that since it's never a valid security practice, it's only admissible for hiding machinery from the general public. For instance, if you have IP you want to protect from average script kiddies. Any serious IP can be replicated by someone with deep pockets anyway. Most other uses of code obfuscation are nefarious, and obfuscated code should always be assumed to be malicious until proven otherwise. I'm not a reputable large company, but no reputable large company should be going to these lengths to hide their process from the user, because doing so serves no valid security purpose.

  • by codedokode on 12/24/22, 8:42 AM

    It is interesting, that while technologies like canvas, WebGL or WebRTC were intented for other purposes, their main usage became fingerprinting. For example, WebGL provides valuable information about GPU model and its drivers.

    This shows how browser developers race to provide new features ignoring privacy impact.

    I don't understand why features that allow fingerprinting (reading back canvas pixels or GPU buffers) are not hidden behind a permission.

  • by TobyTheDog123 on 12/24/22, 11:27 AM

    TikTok changes this algorithm about once every three months. I've reverse-engineered it about two times, and have since given up and decided to run a headless browser to do it for me. I'd love to see some tool developed to automate solving this so I can sign requests in a more limited context (ala Cloudflare Workers / C@E)
  • by thih9 on 12/24/22, 9:52 AM

    I've seen some of these techniques elsewhere; e.g. javascript-obfuscator supports replacing variable names with hex values [1] or transforming call structure into something more complex [2]. Bytecode generation is new to me; is there an existing JS obfuscation tool, preferably open source, that supports it?

    [1]: https://github.com/javascript-obfuscator/javascript-obfuscat...

    [2]: https://github.com/javascript-obfuscator/javascript-obfuscat...

  • by derefr on 12/24/22, 5:21 PM

    FYI, most CAPTCHA and anti-DDoS services (e.g. Cloudflare) do something very similar, sending the user an obfuscated program implemented on top of an obfuscated JS VM, that they effectively have to execute as-is, in a real browser, to get back the correct results the gateway is looking for. This is done to prevent simple scraping scripts (the ScraPy type) from being able to be used to scrape the site. If you want to do scraping, you have to spend the extra overhead of doing it by driving a real browser to do it. (And not even a headless one; they have tricks to detect that, too.)
  • by antiviral on 12/24/22, 8:24 PM

    This is excellent work.

    It also shows how Tiktok may be in violation of several US/EU privacy laws. I really wonder now who this data is shared with. Perhaps someone should bring this article to the FTC’s attention for further review.

  • by KirillPanov on 12/24/22, 1:20 PM

    Awesome, really awesome work. However:

    > If that is something you are interested in, keep an eye out for the second part of this series :)

    Your site is missing an RSS/Atom feed, so I can't do that. ::sad face::

  • by wiml on 12/24/22, 6:03 PM

    Given that the beginning of the "weird string" has a magic number and a version field, I wonder if the point of this is not so much obfuscation as transpilation? The magic number corresponds to ASCII "HNOJ" "@?RC", or perhaps "JONH" "CR?@", which doesn't turn anything up on Google but it seems odd to include that redundant header if your main goal is minification or obfuscation.
  • by amelius on 12/24/22, 7:48 PM

    Can someone explain what VM they are talking about, and where that VM is running on, and what is running in it?
  • by Aperocky on 12/24/22, 3:03 PM

    Isn't the same concept also used in Youtube? I believe a python mock of the equivalent VM exist in youtube-dl.
  • by Alifatisk on 12/24/22, 4:49 PM

    I never knew that Tiktok was shipped with its own virtual machine!

    But that explains the obvious subdomain vm.tiktok.com

  • by born-jre on 12/24/22, 12:21 PM

    Something hit me when reading this, you know how zknark is touted as tech which in future allow to create app that can work on user private data while preserving user's privacy, could it be used as (opposite) an obfuscation technique to, u encrypt users data inside and zk oracle in user side and send to server. You could reverse engineer what are the inputs to the oracle, but not further what exactly it sends to the server?
  • by mhasbini on 12/24/22, 11:14 AM

  • by lazyeye on 12/24/22, 11:06 PM

    There needs to be a publicly funded charity that pays people to work fulltime de-obsfucating all the major apps. This should be a well-resourced ongoing operation.
  • by derefr on 12/24/22, 5:18 PM

    That HTTP request is kind of hideous. All those extra parameters that have nothing to do with what the response will end up being, and which change often. Seems like a great way to toss out all your API-response edge-cache-ability.
  • by thecleaner on 12/24/22, 11:16 PM

    Can I conclude that TikTok implemented a custom VM in Javascript ? Any idea what its used for and how many instructions it can process and are there other comparable implementations ?
  • by Exuma on 12/24/22, 3:17 PM

    This article is 2 hours old and his Twitter is already changed?
  • by apienx on 12/24/22, 11:19 AM

    Solid case! Thanks for taking the time to write it up.

    Those who care and have to use TikTok can probably add their own virtualization layer (and tolerate the hit in cost/performance).

  • by frozencell on 12/24/22, 9:04 AM

    The hunt begins.
  • by draw_down on 12/24/22, 10:39 AM

    > void 0 (a fancy obfuscated way of saying undefined)

    Kind of. But it was possible at one point, maybe still is, to rebind `undefined` to some other value, causing trouble. `void` is an operator, a language keyword; it’s guaranteed to give you the true undefined value. (In other words, the value whose type is `undefined`.)

    If you’re coding against an environment as adversarial as these people clearly believe they are, you’d go with `void` as well.

  • by Kukumber on 12/24/22, 1:55 PM

    Nice use of low altitude satellites to track individuals and sniff telecoms all over the world

    This decompiled object class also spy on the grid network, that's quite interesting and very clever

    I never knew we could also lobby governments to push for some office and cloud software full of spyware, even France had to ban them! [1]

    This TikTok app is very dangerous!

    Of course /s

    [1] - https://news.ycombinator.com/item?id=33686599