from Hacker News

Is Bard trained on Gmail? Depends what the meaning of the word “is” is

by purplesnowflake on 5/30/23, 5:28 PM with 3 comments

by smoldesu on 5/30/23, 5:52 PM
> AI researcher Kate Crawford was quick to ask Bard itself where its dataset came from. The answer caught her attention: Bard said one of its data sources was Gmail.
Did they find anything? There's a lot of hand-wringing at the start, then a big focus on how Google can't deny that emails are in their training data. Then they finish by interviewing Bard. Google's response makes sense, given that they're working with multi-terabyte language files. It probably has seen Gmail contents through the form of naturally published emails that just get picked up with other data. Claiming otherwise would be confidently wrong.
It would be interesting if they had a "Q_rsqrt in Copilot" moment here, but they don't. There seems to be no evidence that Google uses private data in Bard.
> Society should be having a robust discussion on these questions, but this is not possible if such discussion is inhibited by key players like Google.
How is Google inhibiting this discussion?
by version_five on 5/30/23, 5:55 PM
The whole asking Bard thing towards the end is completely meaningless and I'd argue irresponsible. They even say
```
  But of course, the observation that Bard consistently makes these claims can’t be seen as evidence one way or the other
```
and then go on to quote a bunch of stuff Bard said.
If I had to speculate, sounds like it could have used anonymized gmail data (could they have some kind of pii removal tool that they run first, that's common, though I wouldn't trust it too much), or something is being pretrained on gmail and fine tuned on something else (hard to see a reason for that). Anyway, google is acting suspicious, but pretending the chatbot's "opinion" has any bearing is disingenuous.