from Hacker News

Is Gmail scanning your emails? Let's find out

by Risse on 8/27/21, 7:10 AM with 3 comments

by LinuxBender on 8/27/21, 12:58 PM
Google scans emails. It isn't real time. Any links I send to a person with a gmail address will eventually be hit by their malware crawlers. This includes links one would never guess in the lifetime of a universe. I used to only put http simple auth in place, but now I also null route their IP ranges. I'm sure they will eventually flag my domains as malicious for doing so. Same goes for Slack/Discord/Steam bots.
I should also add that these bots are in no way associated with robots.txt and site indexing. The crawlers hitting URL's in emails are not indexing a site, they are looking for malicious content. The contents of robots.txt will have zero impact on this behavior and is not queried at all.
I suspect the reason people get conflicting behavior is that Google likely has a mechanism in place that maps domains to levels of trust and connectivity. If a site is known to scan itself for malware, there would be no need for Google to duplicate the effort.
by sillycross on 8/27/21, 7:41 AM
Unless there is end-to-end encryption (which gmail does not support, partially because US government requires Google to monitor accounts supposedly belonging to terrorists etc), the server always "knows" your email in plain text.
It seems like processing user email text through a program that actively "works" on the text (v.s. simply encrypting/decrypting/transmitting the text) is generally not considered a privacy concern.
I feel like this is a bit tricky given the evolution in ML/AI. By making "feeding user email into any program" acceptable, the chance that rule-breaking incidents (e.g., using email text to train models) is unnecessarily increased imo.
by QuackyTheDuck on 8/27/21, 8:17 AM
I just skipped through the video and find that just because the hyperlinks in his mails weren't visited by Google, it is plain wrong to derive that "Google is not scanning your mails" …