Deleting your analytics history

GDPR: Setting your website’s data retention settings in Google Analytics for professional services firms

Which analytics data is being deleted by Google Analytics from May 25th?

Data about Users and Events over 26 months old (the new default in GA) unless you change the settings.

How will deletion of this data affect you?

As things stand this may not affect many lawfirms or accountancy firms but an example of how it might is where you create a filtered report looking at behaviour of users from say a particular geographic region and look back 5 years. For example, “How have Users coming in from my tagged newsletter programs (Source = QuarterlyNewsletter, Medium=Email) from the US grown over the last 5 years?”

Aggregated numbers, for example how many visits you got per time period, pageviews per page etc will not be impacted Google says, but given Google Analytics is a rapidly evolving platform and firms are gradually getting more sophisticated in how they use it, loss of options in examining older user data is still an issue in our view.

In addition Event data is important as usually that’s where many firms we work with implement high value activity tracking – so losing historical perspective on that could also be problematic.

How should you set the Google Analytics data retention settings?

For most professional services firms we believe you should set this setting to ‘Do Not Automatically Expire’ rather than the default setting Google imposed on May 25th of ‘expire at 26 months’.

Data retention settings Google Analytics

If you like, you can of course review this ‘ Do Not Automatically Expire’ decision after the change is made as it starts to be clearer how User/Event deletions will impact organisations (it’s not really clear how the deletion will impact reporting as it hasn’t happened yet).

However you may want to shorten this data retention period for professional services firms who gather information in Google Analytics which when combined with other data might allow a user to be personally identified.

Isn’t that last sentence a little vague?

Yes it is – so we’ve given some examples at the end of this article.

Why is Google deleting historical user level analytics data then?

Out of an abundance of caution (risk mitigation). Imagine Google asking themselves the question: given all the analytics setups out there could a user be personally identified given several years of anonymous data? And keep in mind that as Google you’re already fighting several regulatory battles in Europe.

Google’s Terms and Conditions for Google Analytics already exclude making users personally identifiable in GA but they are responding to European General Data Protection (GDPR) regulations coming into force.

Specifically the GDPR regs (Recital 30) state:

“Natural persons may be associated with online identifiers […] such as internet protocol addresses, cookie identifiers or other identifiers […]. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.”

In summary:

“The GDPR explicitly states that online identifiers, even if they are pseudonymous, even if they do not directly identify an individual, will be personal data if there is potential for an individual to be identified or singled out.” [quote from Cookie Law but our emphasis]

Personal data comes with a raft of GDPR obligations such as opt-in requirements, reproduction rights, privacy, breach reporting, security requirements, a data protection officer, transfer limitations and in the context of Google Analytics and data retention that you automatically delete data that your business no longer has any use for.

Google Analytics relies on cookies – does that mean my data is personal?

Probably not. It’s likely your firm’s Google Analytics data is not personal in our view because users coming into lawyer and accountant websites (and a User is actually an anonymous numeric id in a cookie on a device) are normally not identified back to an individual person.

And from a practical point of view insisting that the full gamut of GDPR obligations is applied to anyone running GA would be challenged not only by Google but by pretty much everyone else using the most widely used analytics system in the world.

We don’t think cookie opt-ins are necessary for firms using standard GA even though some professional services firms have opt-ins (most don’t). And you can wrap yourself in knots thinking about whether a cookie that’s there to tell the website that you should not be tracked with cookies when you come back for your next visit makes sense – and if it doesn’t make sense how users will respond to being asked to opt out every time they visit…

Going further, if you implement an explicit cookie opt-in as many firms have done but don’t actually require it, you incur a significant cost in terms of your ability to understand what’s working (or not) on your site and in the market.  That’s because this opt-in code essentially says ‘if you don’t actively accept cookies by clicking an ‘I accept’ button then don’t execute the Google Analytics tracking script at all’ (you will  never even know someone visited your site let alone what they did).

We believe that a significant number of firms that have implemented explicit opt-ins (perhaps on the advice of their web developer quoting for the functionality?) saw opt-ins as relatively costless (‘better safe than sorry’) and neither fully understood what they were giving up nor what the alternative was (making sure no personally identifiable user information gets written to your analytics).

Lastly, as you might expect, we don’t think it is necessary or sensible to anonymise IP addresses for firms using standard GA although in some places you’ll read that maybe you should (there are however other steps we think you should take to make sure your GA data is clean).

But what if you have ‘modified GA’ data that actually identifies people?

An example might be where a page effectively embeds the name of a user in a url. Say you’ve sent out campaign tagged urls where you logged customer id details (stored in your own database, but corresponding to a named individual) in the parameters of the url which as a result would then be written to GA’s campaign data and stored in GA:[an-individual’s-account-number]&utm_campaign=MergerArticle

By doing this you have now a problem with GA’s Ts & Cs and in addition you probably had better start looking at GDPR requirements.  And you might want to consider deleting user level data sooner than the default of 26 months depending on how you actually use this data.

And finally…

We’re not lawyers we’re analytics people – however if you’re a lawyer with a sub-speciality of data privacy – let us know if you have a different interpretation of any of the above and we’ll happily quote you on it right here.