by goferito on 4/21/17, 3:35 PM with 48 comments
by CobrastanJorji on 4/21/17, 5:01 PM
Say your bot misbehaves and effectively starts DOSing a site with a whole lot of pages, like a small Reddit clone or something. And say Reddit doesn't have another way to determine between your bot and the Googlebot. You have now put Reddit in a position where they have to either block the Googlebot (and possibly lose a huge pile of money in the process) or else buy up a lot more hardware and bandwidth to pay for your crawler as well. That's not cool, to put it bluntly.
by awinter-py on 4/21/17, 4:11 PM
Most important argument: the chrome user-agent contains the word 'mozilla'. Obviously (we argue) google isn't intending these to be accurate and instead are some kind of compatibility mark.
Are you committing trademark violation? Given the nature of trademarks, it's not clear that you are.
Are you misrepresenting yourself to the site in a way that violates the CFAA? This is probably your biggest area of risk. But you can argue the site is giving away information to google, a company whose slogan until recently was 'free the world's information'. Therefore they weren't taking plausible steps to secure the information you've scraped.
by mootothemax on 4/21/17, 4:13 PM
https://support.google.com/webmasters/answer/80553?hl=en
I know of a few sites that use this as the first step (of many!) to add bots to their "naughty" list.
by cube00 on 4/21/17, 3:57 PM
However, consider what your ultimate end game is, if it's a website you expect visitors to find through Google or the Play store, good luck once web masters start reporting your misbehaving "Googlebot" crawler.
by beejiu on 4/21/17, 4:10 PM
by matt4077 on 4/21/17, 4:03 PM
In my home country, it's actually quite interesting: fraud usually requires (a) a lie (conveying wrong information with intent), and (b) a financial cost to the other party, and (c) a financial gain for you.
It's debatable at that level, already, because their loss is rather hard to quantify, and probably small. Plus, I believe your financial gain must be directly related to their cost.
And, finally, you actually have to lie to a human being. Lying to a machine doesn't qualify. There was a guy who earned some 5-digit Euros amount by producing fake bottles and feeding them into deposit machines–no crime!
by d2p on 4/21/17, 5:13 PM
by riceo100 on 4/21/17, 3:57 PM
by taftster on 4/21/17, 4:09 PM
As for the site owner, it's on them to decide what to do with your traffic. HTTP is an open protocol and extensible. You could send almost anything in your request, as allowed by the protocol. The site owner has opened their service to the HTTP protocol and it's on them to decide what to do with your traffic.
by terminalcommand on 4/21/17, 4:04 PM
If the sites in question only add an exception for googlebot and not other crawlers (e.g. Yahoo, bing, etc.) I would say that it is against the site owner's consent.
However if the site owner adds this exception also for other crawlers, you could argue that the site owner's intent of only allowing certain crawlers has not been made explicit. In that case you'd have a chance against the claims from the site's owner.
On the other hand Google could possibly sue you for using the user-agent "Googlebot".
The important question here is: would they? If you stay under the radar no one -even the courts- would bother.
PS: I am only a law student, I am not familiar with any laws/regulations/precedents governing this specific issue. I think from the site owner's perspective it's a grey area. From google's perspective brands and ip are established concepts in law. This is a student's very personal opinion at first sight, take it with a grain of salt :).
by dbg31415 on 4/21/17, 5:28 PM
"I left my door unlocked and told my friends they could use my living room, but then they put their feet up on my coffee table... Not cool, man!" Pretty much the equivalent situation.
by syrrim on 4/21/17, 4:50 PM
by Edmond on 4/21/17, 4:04 PM
by mightytightywty on 4/21/17, 5:15 PM
by alxmdev on 4/21/17, 5:31 PM
by fbomb on 4/21/17, 4:18 PM
by jasonkostempski on 4/21/17, 4:50 PM
by mdekkers on 4/21/17, 4:36 PM
by oliv__ on 4/21/17, 5:31 PM