by pauljarvis on 7/22/19, 2:29 PM with 120 comments
by pauljarvis on 7/22/19, 2:30 PM
by unilynx on 7/22/19, 9:08 PM
"For tracking unique page views"
if(!sessionStorage[location.href]) {
sessionStorage[location.href]=1;
navigator.sendBeacon("/unique-pagehit?" + encodeURIComponent(location.href));
}
"For tracking unique site views" if(!sessionStorage["Hi!"]) {
sessionStorage["Hi!"]=1;
navigator.sendBeacon("/unique-sitehit");
}
"For tracking previous requests"I'm not sure I fully understand what is being measured (is it session-only?). For the duration someone watched a page, you can use sendBeacon in onBeforeUnload. To detect a bounce, set a Math.random() in a session variable, send it at the start of the page, and have every page load send the previously stored random variable. Then count the unique random keys you received on the server - those are the bounces.
I know, in practice you'll need to trim sessionStorage, sanitize URLs, use something less-colliding than Math.random, dealing with new tabs, some polyfills and other robustness, etc... but I don't yet see why the tracking mentioned needs any user ids or hashing at all.
by moose333 on 7/22/19, 3:51 PM
by ares2012 on 7/22/19, 6:15 PM
However, since such businesses already need to collect personal info as part of your account creation it shouldn't be hard to build analytics on top of that existing PII. If they are already collecting PII it doesn't seem to save much to have their analytics tool avoid it?
by AndrewStephens on 7/22/19, 3:26 PM
What the article is discussing looks (at first brush) to be a sensible way of aggregating users up-front before it hits the database, rather than later. So no personal data is stored.
Does this meet the requirements for a site to avoid notifying users under the GDPR? I have no idea.
Even with the best of intentions, if you use a service like this then you are relying on them a) doing what they claim, and b) not screwing up (by leaving logs around, etc).
If I use this service and data from my users gets leaked by Fathom, who gets blamed? The users were on my site, so I guess it is I that gets fined. Maybe the risk is worth it, maybe it isn't.
by labawi on 7/25/19, 1:43 PM
I would have more faith in privacy, if you didn't store the salt in the DB or permanent storage. If you manage to statically load-balance the users (e.g. hash site, ip, user-agent, don't forget site), the hash could be in-memory only. Sessions would break on server restart, but that's more of a feature.
To move thing further, you might not even need to store the hashes in the DB. Keep them in server memory only and (real-time) update aggregate data in DB.
by i_anon on 7/23/19, 8:54 AM
I wondered whether you could explain what makes your hashing different from the hashing used by Facebook for their custom audiences tool which was deemed unsuitable for anonymisation as per https://www.spiritlegal.com/en/news/details/e-commerce-retai...
by mrweasel on 7/22/19, 4:18 PM
by SCLeo on 7/22/19, 7:20 PM
(I mean, I don't have a point here but I find it pretty interesting. xD)
by billabul on 7/22/19, 2:51 PM
by vmlpvf on 7/22/19, 9:00 PM
Nevertheless, the chances of identifying someone are probably pretty low, and it´s a good effort to make analytics more privacy friendly.
by tomp on 7/22/19, 11:02 PM
Regarding (1),
> Brute forcing a 256 bit hash would cost 10^44 times the Gross World Product (GWP). [...]
> We have rendered the data anonymous to the point where we could not identify a natural person from the hash.
> It's possible that GDPR does not apply to Fathom since data is made completely anonymous. Even if GDPR did still apply, we reiterate the stance that there is legitimate business interest to understand how your website is performing.
This seems to imply a profound confusion between the difference of hashing vs. anonymity. Just because it's hashed doesn't mean it's anonymous! You don't need to "brute-force" the hash, you just need to find a user that matches your hash... which is 1 in 7 billion (or so), much more tractable. This is also the principle e.g. MD5 rainbow tables are based on...
They claim to change the hash every 24 hours, so it's equivalent to having a session cookie with 24-hour expiration (session cookies are "anonymous" by their definition, they don't have any user information and they're impossible to "brute force", they "just" enable tracking). I've no idea if 24-hour session cookies are GDPR-compliant...
Regarding (2), given that this seems (again, I might be misunderstanding) equivalent to a 24-hour session cookie, why not just do that? However, then you're ... drumroll ... giving control to the user. Why not just give control to the user, period?! For example, by storing the list of pages visited in Local Storage, and only pinging the server once for each page(view) every 24 hours?
by saagarjha on 7/22/19, 6:14 PM
What's the difference between a page view and a visit?
by jacquesm on 7/22/19, 6:04 PM
Thanks for building this, I will promote it.
by st3ve445678 on 7/22/19, 3:03 PM
by CHsurfer on 7/22/19, 3:10 PM
I'm not sure this is a good idea.
by felixfbecker on 7/22/19, 4:34 PM
by EGreg on 7/22/19, 5:27 PM
My guess is they use localStorage and sending the hash to their servers with each request.
So we are talking about a mechanism that’s just like a cookie.
As long as they don’t have any PII and can’t figure out who the user was, then I think the GDPR gives them an exception.
But “without cookies” claim is dubious!
by itronitron on 7/22/19, 5:55 PM
by gcbw2 on 7/22/19, 10:00 PM
None of the information you are using on the hash wouldn't be in the search query itself! ip, user agent, path, date, etc. So there is no way to reverse the hash. You just hash your search query and compare in O(1) time.
The only piece of information that realistically makes the hash slightly difficult to get is the random number refreshed every day. But either you store it (and i have no reason to believe you do not) or it make the brute force effort trivial as I only need to generate the hash with that variable now.
by kitchenkarma on 7/22/19, 5:56 PM