GDPR

Comment on the CNIL statement for the use of Google Analytics

After long discussions, the rules under which Google Analytics can be used legally in the EU have finally been clarified. As we have already announced, the solution will lead to the use of an independent system (proxy), which ensures that no personal data will not arrive into Google Analytics.

Autor: Jan Hornych

Published: Jun 8th 2022 | 5 min read

The French Data Protection Authority, the CNIL issued a statement in its FAQ on how to use Google Analytics to comply with the General Data Protection Regulation (2016/679 GDPR).

The original issue, why local authorities in Austria, France and Liechtenstein banned the use of Google Analytics (even in the anonymous mode) was related to unauthorized transfer of personal data outside of the EU. Google responded to this shortcoming with a solution that the collection servers will be located close to the location of the measured IP address*. According to CNIL this is not sufficient and therefore they issued a list of rules under which Google Analytics can be used in the EU.

For example, when testing this on June 10th 2022 and accessing the web page from a EU region (Prague), the request goes to a server with the address region1.google-analytics.com, which has IP 216.239.32.36 and according to ip address lookup it is a server located in Google Data Center in California. But a week earlier, the requests went to servers in the EU. Source: whatismyipaddress.com

According to a statement made by the French authority, even this procedure (if it would work, which it doesn't seem to, see image above) is insufficient, and therefore they created a set of explicit rules that must be met to ensure the use of Google Analytics** complies with the GDPR regulation and ensure that no personal data is sent outside the EU.

The Authority's recommendation is therefore not to send data to Google Analytics directly, but to use an intermediary system to cleanse the data before sending it to GA. A so-called proxy server. Below are listed rules required by the authority, including my personal comment.

To comply with GDPR, this proxy must provide the following functionality

IP addresses must not be sent to servers belonging to the measurement tool.

I would, here, rather respect Google's claim that once it has cut the last Byte of an IP address, it will never link that data again and that last Byte will be forgotten forever. But so be it, if I send in the IP address already cut off, that's better. In fact, this was even suggested by Google in their presentation. They suggested to use SGMT in a docker running outside of the Google Cloud.

The device identifier (visitorId, in GA's case _ga cookies) and any user identifier must be replaced.

This is logical. The text also mentions that pseudo-anonymization is acceptable, but only if the algorithm does not run on the measuring platform's server and the platform cannot access it.

Information about what page the user came to the site from must be deleted.

We're talking about the document.referrer parameter here and that's probably too strict, I can't think of a case (except in some extremely unlikely scenarios) where this parameter, if it's just a domain, could help in identifying the user.

All parameters in the URL must be deleted when the page is submitted.

As with the previous case, if the parameter is aggregated, I wouldn't consider it as data that can help with subject identification. Some pages might semantically differ depending on what parameter they have, for example ?filter=newproducts. I guess this can be solved by some subsequent mapping to virtual pages. They also mentioned not to send utm parameters along with the page. Again, if it contains an aggregated identifier, such as a campaign ID, I find it pretty harmless. On the other hand, why should one send such parametr in the page path and not in a custom dimension.

Additional techniques that will lead to enrichment of the data collected must not be used. For example, fingerprinting, user agent detection, etc.

This is logical, so no comment

No other cross site identifiers should be sent.

I don't know exactly what they mean by this The only dangerous think I can think of is link the visitor behavior between different sites. Probably something like 3rd party cookie parameters?

No other data that will lead to the identification of the subject.

They just repeat what is already mentioned by the regulation itself.

Other comments

Finally, the authority requires that the proxy server must run in an environment that ensures that collected not redacted data, are not in reach of the measurement platform and further to ensure that the proxy server itself is not running outside of the EU. So this is perfectly logical, perhaps to prevent someone from thinking of running the Proxy server in AWS, Azure, Heroku or any other cloud environment in a data center located in the US. The Google Cloud itself, regardless of location, is out of the question because it is not technically possible to gurantee Gooogle will not be able to link the data. The proxy must therefore run in EU lcoated datacenter or on internal servers.

My knowledge of French is at such a level that this layman's translation can be considered only as my personal atempt to bring the rules to other non french speaking audience. For those interested to read it in french, here is the original.

I have tried to be as objective as possible, but the fact that our product portfolio includes a product mHub Cloud which is such proxy, it is possible that, albeit unintentionally, my view is influenced.

*Google Analytics - Regional data collection
**CNIL refers to Google Analytics, but of course this also applies to other web analytics platforms where there is a risk of data transfer outside the EU.