by titaniumtown on 7/7/24, 6:36 AM with 51 comments
by CobrastanJorji on 7/7/24, 7:17 AM
But at the same time, blocking search engines from indexing your social media site is a dangerous game. Any search engine that respects this is gonna effectively de-list Reddit. That's no good for views, and views is what makes Reddit money. Presumably they have negotiated private deals with Google and probably Microsoft for this and are trying to sell their data to ML companies, because otherwise this would seem suicidal.
Kind of a shame. The information is still going to get shared around to all the giant corporations, but Reddit will presumably make it harder to access for all the little guys. And the more they tie the content to dollars, the more managers on the inside will start doing stupid things to try and generate more of whatever the most valuable kinds of content are.
by sunaookami on 7/7/24, 7:39 AM
Side note: They seem to serve other robots.txt for different User-Agents & IPs: https://merj.com/blog/investigating-reddits-robots-txt-cloak...
by benreesman on 7/7/24, 7:17 AM
OpenAI is openly collaborating with the NSA, Google is manipulating the definition of a web crawl, Anthropic has installed a bunch of humanitarians from Jump Trading as the leading mech interp group that makes strident claims about how all this stuff works based on weights you do not and never will have access to.
They’re telling you: “And you will do nothing, because you can do nothing.”
I invite you to join me in proving that we can in fact do something.
by dageshi on 7/7/24, 7:13 AM
AI is the death-knell of the web as we've known it for the past three decades. Once freely available information will retreat behind login walls and charge bigco's for access to train their models.
I wonder if some standardised data API will be settled upon, perhaps it already exists?
by rany_ on 7/7/24, 9:44 AM
# Welcome to Reddit's robots.txt
# Reddit believes in an open internet, but not the misuse of public content.
# See https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy Reddit's Public Content Policy for access and use restrictions to Reddit content.
# See https://www.reddit.com/r/reddit4researchers/ for details on how Reddit continues to support research and non-commercial use.
# policy: https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy
User-agent: *
Disallow: /
by Seattle3503 on 7/7/24, 7:23 AM
by hamilyon2 on 7/7/24, 8:07 AM
by eps on 7/7/24, 8:13 AM
80% of my Google searches for other people's opinions now end with "site:reddit.com", and there is surprisingly quite a few of them. The alternative is Reddit's own search and it tends to produce less relevant results.
by skilled on 7/7/24, 6:55 AM
Google has not commented on whether they plan to respect it. Rich Results[0] say they're using a version from June 25. The new version was last modified July 1.
by jkhanlar on 7/7/24, 11:54 AM
by nubinetwork on 7/7/24, 8:28 AM
by jkhanlar on 7/7/24, 10:08 AM
by Lorin on 7/7/24, 7:36 AM
What are they thinking?