from Hacker News

Breaking the 4Chan CAPTCHA

by hazebooth on 11/29/24, 8:32 PM with 349 comments

  • by cherryteastain on 11/29/24, 10:02 PM

    The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.

    Actually, I'll extend that to saying every open source Google library/tool feels like that.

  • by Dachande663 on 11/30/24, 8:00 AM

    Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.

    [0] https://www.hybridlogic.co.uk/contact

  • by bawolff on 11/29/24, 11:54 PM

    There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

    https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

    However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

    However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

  • by mieko on 11/30/24, 12:23 AM

    If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha
  • by antirez on 11/29/24, 9:43 PM

    Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.
  • by somat on 11/30/24, 3:55 AM

    I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

    If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.

    Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

  • by benreesman on 11/30/24, 8:42 AM

    In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw
  • by Pikamander2 on 11/30/24, 9:15 AM

    > The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

    > TensorFlow.js doesn't support Keras 3.

    I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.

  • by ChrisMarshallNY on 11/29/24, 10:00 PM

    That’s like spending a few hours, learning to take the lid off your septic tank.
  • by morkalork on 11/29/24, 10:01 PM

    Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!
  • by tumsfestival on 11/29/24, 9:30 PM

    I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.
  • by makifoxgirl on 11/29/24, 10:33 PM

    This project also solves the 4chan captcha https://github.com/moffatman/chan
  • by Alifatisk on 11/30/24, 1:53 PM

    If there is one blog I've fell in love it, it's nullpt.rs. Still waiting for part 2 of Reverse Engineering Tiktok's VM Obfuscation
  • by ranger_danger on 11/30/24, 12:41 AM

  • by Yeul on 11/30/24, 11:25 AM

    I understand why Cloudflare has to exist. But its beyond annoying that it forces you into using an unmodified Chrome sans VPN.
  • by hobom on 11/29/24, 11:47 PM

    Does 4Chan also have bot BEHAVIOR detection (e.g. unnatural mouse movements)that google captcha has?
  • by chad1n on 11/29/24, 10:41 PM

    I've built 3 iterations of captcha solvers for that crappy website based on https://github.com/drunohazarb/4chan-captcha-solver/issues/1 . The only thing I've learned along the way is that it's mostly pointless outside of a "learning" exercise, since they'll change the captcha (in terms of letter count or the entropy background). Initially, it was 4 characters with pretty obvious background, then it turned to 5, then it was both 4 and 5 and the current iteration which is also either 4 or 5, but with a lot of entropy surrounding the characters.
  • by kattagarian on 11/30/24, 3:16 AM

    I remember trying to use 4chan once and i couldn't even pass through the captcha.
  • by smithcoin on 11/30/24, 8:42 AM

    I’ll never forget spending the evening of the 2016 election on /pol/
  • by m3kw9 on 11/30/24, 8:17 PM

    Very tasteful title animation I must say. It’s fast enough, you feel it, and not distracting, gives a vibe even from glancing
  • by asynchronous on 11/30/24, 1:48 AM

    [meta] what blog site is this? Is it a joint among authors? I can’t find more information on their GitHub. Looks neat.
  • by 2Gkashmiri on 11/30/24, 2:01 PM

    Hey dude. Any idea if 1000 labelled images are good enough for training and how much time it would take to train on a a40 nvidia like on https://www.runpod.io/pricing ?
  • by unit149 on 11/30/24, 5:30 AM

    Parsing the visualization data, within a JSON script tasked with parsing it is a complex endeavor when the site requires verifying email.

    If the JSON file is corrupt, it shows the following if tt1 and cd do not align.

    > "error": "You have to wait a while before doing this again"

  • by lofenfew on 11/29/24, 9:42 PM

    It might be worth noting that this, including the harder version the op encountered, are not the hardest captchas that 4chan can serve. There is a still harder version which is sent to less trustworthy IPs. I imagine it would still be tractably solved with computer vision. This in part misses the point though, since 4chan has been continuously altering their captcha since it released, making it difficult to create a permanent solution that won't be broken down the road.
  • by cchance on 11/29/24, 10:02 PM

    Jesus looking at both example captchas... as a human... i have no fucking clue the answer lol
  • by axpy906 on 11/30/24, 2:24 PM

    It’s nice to see this posted and interesting that it’s in tensorflow. I wonder for how many years the capture was already broken but not just posted about publicly.
  • by b8 on 11/30/24, 5:04 PM

    Glad to see Blackjack and Jordin. We used to hack on Minecraft together. nullpt.rs and secret.club are full of former video game hackers :)
  • by thrance on 11/30/24, 12:03 PM

    4Chan is probably one of the only social platforms where genuiune users and russian bots share the same views, why even bother with CAPTCHAs?
  • by mgaunard on 11/30/24, 10:09 PM

    I remember when they introduced their new captcha; it was so tedious to solve it I stopped interacting there entirely.
  • by chistev on 11/30/24, 9:31 AM

    Man, is there anything computers won't be able to break!

    crazy

  • by cubefox on 11/30/24, 12:33 PM

    Not a word on how describing and releasing this code is obviously unethical!? Captchas have a legitimate use to keep bots out.
  • by matrix87 on 11/30/24, 2:09 AM

    the blacked out minimalist aesthetic on this site looks really cool
  • by nfRfqX5n on 11/30/24, 11:06 AM

    Hi veritas
  • by dmitrygr on 11/29/24, 9:55 PM

      > The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented, and the error messages thrown when you try to use it on Python 3.12 are non-obvious. I tried an older version of Python (3.10) on a hunch, using PyEnv, and it worked like a charm.
    
    Amazing. And then people wonder why "just use python 2" is still a thing.
  • by tomxor on 11/30/24, 2:46 AM

    Bet it can't break reCAPTCHA on a VPN.

    [edit]

    More specifically I mean when they insidiously give you infinite tests even though it's impossible to pass because the IP has been blacklisted... There's a special place in hell for the anti-human's that made that decision, and yes it involves captcha.

  • by fresh_broccoli on 11/29/24, 11:26 PM

    I wasn't a very active 4chan poster to begin with, but when they introduced this awful CAPTCHA, and later the 300s countdown before making the first post, I completely lost interest in using the website.

    Anonymous boards were supposed to be low-friction, but now 4chan is one of the most user-hostile social media platforms around. It takes a special kind of dedication to post there, which I seriously doubt helps the quality of the site.

  • by anigbrowl on 11/29/24, 9:03 PM

    Congratulations, now it will get upgraded and become more work for humans to solve, increasing the burden on every non-malicious user.
  • by tomcam on 11/30/24, 1:14 AM

    If there's one place on the web I would apply anonymity with great diligence, it would be posting any article that might put me at odds with the good people of 4Chan.

    mostly kidding! mostly

  • by NoMoreNicksLeft on 11/30/24, 5:25 AM

    I suspect really strongly that the available characters in the 4chan captcha were chose to be able to spell out the most racist/nazi/extreme slurs and slogans imaginable. For instance, not all numerals are ever used, but 1, 4, and 8 are. K is often there, and whatever the algo is, pseudorandom or not, it often doubles/triples characters. I've personally seen "kkk" twice over the years. Mind you, it does seem random. But even randomly, these must happen often enough to set that crowd off, they make a game of posting a screenshot of the "good ones".