by proszkinasenne2 on 10/31/21, 8:48 PM with 298 comments
by bsamuels on 10/31/21, 9:15 PM
If this guy got to experience how systemically bad the credential stuffing problem is, he'd probably take down the whole repository.
None of these anti-bot providers give a shit about invading your privacy, tracking your every movements, or whatever other power fantasy that can be imagined. Nobody pays those vendors $10m/year to frustrate web crawler enthusiasts, they do it to stop credential stuffing.
by ChuckMcM on 10/31/21, 11:35 PM
It sounds great but it is a completely ignorant thing to say.
by ufmace on 11/1/21, 4:01 AM
by marginalia_nu on 10/31/21, 11:25 PM
Also, in my experience, most websites that block your bot, block your bot because your bot is too aggressive, or because you are fetching some resource that is expensive that bots in general refuse to lay off. Bots with seconds between the requests rarely get blocked even by CDNs.
by al2o3cr on 10/31/21, 10:24 PM
You use this software at your own risk. Some of them contain malwares just fyi
LOL why post LINKS to them then? Flat-out irresponsible... you build a tool to automate social media accounts to manage ads more efficiently
If by "manage" you mean "commit click fraud"by abadger9 on 11/1/21, 12:55 AM
by curun1r on 10/31/21, 10:38 PM
This kind of indirect scraping can be useful for getting almost all the information you want from sites like LinkedIn that do aggressive scraping detection.
by rp1 on 10/31/21, 10:46 PM
by walrus01 on 10/31/21, 11:52 PM
by welanes on 11/1/21, 2:49 AM
I run a no-code web scraper (https://simplescraper.io) and we test against these.
Having scraped million of webpages, I find dynamic CSS selectors a bigger time sink than most anti-scraping tech encountered so far (if your goal is to extract structured data).
by peterburkimsher on 10/31/21, 11:38 PM
CouchSurfing blocked me after I manually searched for the number of active hosts in each country (191 searches), and posted the results on Facebook. Basically I questioned their claim that they have 15 million users - although that may be their total number of registered accounts, the real number of users is about 350k. They didn't like that I said that (on Facebook) so they banned my CouchSurfing account. They refused to give a reason, but it was a month after gathering the data, so I know that it was retaliation for publication.
LinkedIn blocked me 10 days ago, and I'm still trying to appeal to get my account back.
A colleague was leaving, and his manager asked me to ask people around the company to sign his leaving card. Rather than go to 197 people directly, I intentionally wanted to target those who could also help with the software language translation project (my actual work). So I read the list of names, cut it down to 70 "international" people, and started searching for their names on Google. Then I clicked on the first result, usually LinkedIn or Facebook.
The data was useful, and I was able to find willing volunteers for Malay, Russian, and Brazilian Portuguese!
After finding the languages from 55 colleagues over 2 hours, LinkedIn asked for an identity verification: upload a photo of my passport. No problem, I uploaded it. I also sent them a full explanation of what I was doing, why, how it was useful, and a proof of my Google search history.
But rather than reactivate my account, LinkedIn have permanently banned me, and will not explain why.
"We appreciate the time and effort behind your response to us. However, LinkedIn has reviewed your request to appeal the restriction placed on your account and will be maintaining our original decision. This means that access to the account will remain restricted.
We are not at liberty to share any details around investigations, or interpret the terms of service for you."
So when the CAPTCHA says "Are you a robot?", I'm really not sure. Like Pinocchio, "I'm a real boy!"
by nocturnial on 10/31/21, 11:30 PM
Why is it so difficult to just respect robots.txt? Maybe there's an idea for a browser plugin that determines if you can easily scrape the data or not. If not, then the website is blocked and then traffic will drop. I know this is a naive idea...
by teeray on 11/1/21, 3:14 AM
by adinosaur123 on 10/31/21, 10:25 PM
I'm currently looking for ways to get real estate listings in a particular area and apparently the only real solution is the scrape the few big online listing sites.
by IceWreck on 10/31/21, 10:11 PM
by dpryden on 11/1/21, 12:37 AM
This whole field of scraping and anti-bot technology is an arms race: one side gets better at something, the other side gets better at countering it. An arms race benefits no one but the arms dealers.
If we translate this behavior into the real world, it ends up looking like https://xkcd.com/1499
by connectsnk on 11/1/21, 12:30 AM
I am curious by what the author means by automating social media accounts to manage ads more efficiently
by kseifried on 11/1/21, 5:03 PM
I think a better solution is to implement 2FA/MFA (even bad 2FA/MFA like SMS or email will block the mass attacks, for people worried about targeted attacks let them use a token or software token app) or SSO (e.g. sign in with Google/Microsoft/Facebook/Linkedin/Twitter who can generally do a better job securing accounts than some random website). SSO is also a lot less hassle in the long term that 2FA/MFA for most users (major note: public use computers, but that's a tough problem to solve security wise, no matter what).
Better account security is, well, better, regardless of the bot/credential stuffing/etc problem.
by softwaredoug on 11/1/21, 3:30 AM
by greeklish on 11/1/21, 1:28 AM
by kinderjaje on 11/6/21, 11:47 AM
But for some websites, even residential ips doesn't let you pass.
I noticed there is like a premium reCaptcha service, which just work differently then standard one and not let you pass. It's mostly shown with a Cloud flare anti bot page.
by intricatedetail on 11/1/21, 12:46 AM
by rfraile on 10/31/21, 10:42 PM
by navels on 11/1/21, 1:13 AM
by Jenk on 11/1/21, 12:21 AM
by janmo on 11/1/21, 12:58 AM
by egberts1 on 11/1/21, 9:26 AM
1. plenty of VPS with many IP addresses (this is easier with IPv6 subnet)
2. HTTP header rearranging
3. Fuzzing user-agent
4. Pseudo-PKBOE algorithm
5. office hours, break-time, lunch-time activity emulation
6. ????
7. profit
I am looking at you, SSH port bashers.
by completelylegit on 11/1/21, 12:09 AM
* Change your user-agent to a real user-agent, cycle it frequently.
* Done.
by billpg on 10/31/21, 11:50 PM
Put your email address in your User-Agent string so they can get in touch if needed.
by lavezzi on 11/1/21, 12:39 AM
by hk1337 on 11/1/21, 3:10 PM
by 0xlwj on 11/1/21, 8:11 AM
by lifeisstillgood on 11/1/21, 1:33 PM
by firerfly on 11/1/21, 11:51 PM
by nuker on 11/1/21, 7:33 AM