by Risse on 8/27/21, 7:10 AM with 3 comments
by LinuxBender on 8/27/21, 12:58 PM
I should also add that these bots are in no way associated with robots.txt and site indexing. The crawlers hitting URL's in emails are not indexing a site, they are looking for malicious content. The contents of robots.txt will have zero impact on this behavior and is not queried at all.
I suspect the reason people get conflicting behavior is that Google likely has a mechanism in place that maps domains to levels of trust and connectivity. If a site is known to scan itself for malware, there would be no need for Google to duplicate the effort.
by sillycross on 8/27/21, 7:41 AM
It seems like processing user email text through a program that actively "works" on the text (v.s. simply encrypting/decrypting/transmitting the text) is generally not considered a privacy concern.
I feel like this is a bit tricky given the evolution in ML/AI. By making "feeding user email into any program" acceptable, the chance that rule-breaking incidents (e.g., using email text to train models) is unnecessarily increased imo.
by QuackyTheDuck on 8/27/21, 8:17 AM