Keeping your Google Analytics data clean

Keep it clean! How well is Google Analytics measuring your professional service firm’s web traffic?


When you report website metrics out of Google Analytics to a marketing committee or general manager at your firm how much confidence do you have in the data? Did you really experience a big lift in Users last month or was this based on bad data?

Here are seven things we do to keep your lawfirm or accountancy firm’s website Google Analytics 4 data clean, in order of importance.

1. Exclude internal staff from your website analytics

You’ve only got a few internal staff versus tens of thousands of visitors so that shouldn’t make much difference, right?

Internal staff can actually have a big impact on certain kinds of high value activity on your website that you are particularly interested in, like someone contacting a lawyer (your staff may use your website just like they use the internal phone directory).

In addition we’ve come across firms where IT has configured all web browsers to open by default on the firm’s home page – imagine what that does to figures for your site (people who visit one page and leave every day).

Because of the above we’ve found firms where internal traffic is 10% or more of their visits.

At least set up a filter to exclude internal IP addresses (or if you’re a client we’ll do it) and remember to update the filter when you add a new office or change your network configuration or ISP. In Google Analytics 4 go to Data Streams -> Configure tag settings (Show all) -> Define internal traffic.

Filter out internal IP addresses in Google Analytics 4

2. Check you’re correctly recording search keyphrases used to find pages on your site with your internal website search

If you review keyphrase information provided in Google Analytics in order to see what searches are leading people to your site you’ll likely be disappointed as most will be labelled “not provided”.

Only very large sites get much in the way of keyphrase information, although still less than 5%, since Google locked it down in 2013 (pre-2013 you will see much more information which is one of the reasons we suggested not creating new Google Analytics accounts when you do a site revamp).

You can use Google Search Console to see some information but equally importantly you can configure Google Analytics to store 100% of search phrases that are used in searching on your own website. 

The way you do this is by telling GA what the URL format for search results looks like.  For example on Magnifirm it is https://www.magnifirm.com/?s=google+analytics if you are doing a search for the phrase “google analytics“. And the  /?s= portion (the “query string”) will always followed by the term being searched for.

This url and query string format is likely what it will be for lawyer or accountant websites running on WordPress, but on some sites it may be something like lawfirm.com/search/google+analytics (do a search within your own site to find out what the url looks like).

Google Analytics 4 tracks an internal search assuming ‘Enhanced measurement‘ is enabStandard query strings for site search in google analytics 4led and the query string is one of the more common ones adjacent.

If you’ve got unusual search query strings (i.e. they’re not in the set shown in the adjacent graphic) go to Data Stream -> Enhancement Measurement -> Site search -> Show advanced settings and add yours.

For most WordPress sites using ‘s’ as their search parameter the adjacent standard settings will work.

3. Set all URLs to lowercase

Google Analytics 4 like legacy Universal Analytics is case-sensitive. So a person requesting the web page mylawfirm.com/profile/partner/BillBloggs will be recorded separately from the person requesting mylawfirm.com/profile/partner/billbloggs and the two will be reported separately by default causing you to produce erroneous (and untidy) reporting unless you manually consolidate.

Sometimes the different case variations will be caused by your web developers coding a particular kind of link with a different case, sometimes it will be a website user, and sometimes it will be your marketing team using Campaign Tagging.

All these situations create problems.

Unlike legacy Universal Analytics, Google Analytics 4 does not have filters to lowercase all urls (yet).

However assuming you’re using Google Tag Manager you can use GTM to lowercase urls there. Follow the process documented here.

4. The GDPR bogeyman, personal data, and cookies

One way to have really clean Google Analytics data is to stop gathering it, because, for example, you think cookies are personal data, and have been scared witless by European data privacy law.

So occasionally we see firms who disable Google Analytics cookies for website visitors by default (sometimes advised by over-zealous website developers) which cripples your firm’s single largest source of market intelligence.

We check both the data retention period below (which for professional service firms can normally be lengthened from the default) and also how cookie warnings are implemented, if you have cookie warning banners (implementing cookie warnings inappropriately can result in losing information about where visitors come from on the first page they visit).

Read about why Google Analytics data is not personal in the GDPR sense of the term, and this to understand cookie policies across global lawfirms we analysed.

In Google Analytics 4 how you configure personal data related settings depends on the size of your firm because of ‘data thresholding’ which is where Google hides data from you in reporting where the set of users is small enough that they think users might be personally identifiable (the data is still there but you can’t see it and yes this is problematic).

In Google Analytics 4 we’ve found that data thresholding for smaller/medium firms (even say under 250,000 users a year) is an issue as so much data just can’t be seen when you start slicing and dicing it.

For smaller to medium law and accountancy firms therefore (roughly <250k annual visitors), to minimize data thresholding, we recommend that you use ‘Device-based’ reporting identity which you set under Data Settings -> Data Collection and then under Reporting Identity

avoiding data thresholding in smaller to medium sized professional services firms

For larger firms (>250k visitors a year) you can potentially enable Google signals data collection under Data Settings -> Data Collection and then under Reporting Identity select User-ID, Google Signals, then Device.  This also helps you futureproof for a world in which cookies may not be available.

Blended reporting identity in Google Analytics 4 for identifying users

However even in large professional services firms if you are finding that you’re running into problems in the reporting you’re doing in that you can’t see data that you think should be there then consider moving back to ‘Device-based’ reporting identity (it depends a lot on the kind of reporting you do).

5. Change the default data retention period

By default GA4 expires data about users after 2 months which impacts reports you build in GA4 Explore (vs standard ‘canned’ GA4 Reports).

Go into Data Settings -> Data Retention and reset ‘Event data retention‘ to 14 months, and select ‘Reset user data on new activity‘ which causes the expiry date to be reset forward to 14 months from the most recent visit for the user.

Data retention in google analytics 4 - default settings to change

6. Not all Direct or Referral traffic to your website is what it says on the tin

When reporting particular kinds of traffic to your law firm marketing committee be aware of the definitions of traffic.

A common problem we see is law firms sending out newsletters without setting them up to be tracked in Google Analytics. Yes, you may be using MailChimp (or the equivalent) and be able to track clicks in that system but you also want to be able to track clicks in Google Analytics because it has a more all-encompassing view of the world than MailChimp. For example a good GA reporting suite will help you identify not just the first click from an email but also where the reader has forwarded that web page of yours (originally sent out via your newsletter) to someone else.

Poor newsletter setup by contrast will see all your newsletter traffic show up as unattributed Direct traffic in Google Analytics.

And poor Google Analytics setup will also see your blog subsite or client extranet showing up as external Referral traffic when you may not wish it to for reporting purposes.

It’s beyond the scope of this article to cover the change to your marketing internal process to correctly label and distinguish newsletters using campaign tagging or merge subsites into your GA results – but it should be done. See our separate campaign tagging guide for professional services firms here.

7. Segments in standard Google Analytics 4 Reports

You can’t use segments in standard GA4 Reports but you can use Audiences (segments are only available in Explore custom built reports). However Audiences are not retrospective – you need to create them before the date from which you wish to analyse data. For example in law firms or accountancy firms you might wish to set up Audiences that you can use in GA4’s standard ‘Reports’ for people who

A quick tip is that you can convert segments you create into Audiences – so if you develop a useful segment consider converting it.

8. Default session (visit) timeout periods are worth changing

Most people don’t decide to use a professional services firm the same way they decide to buy bananas when they’re doing online grocery shopping. They’ll take a longer time considering your firm and the higher stakes the issue for them personally or for their business the longer they are likely to take.

GA4’s defaults for sessions in a website visit are quite short. We recommend changing the default session timeout (how long an individual’s visit to your website persists until GA times it out) from 30 minutes to say 7 hours (the course of a business day) as people may open browser tabs and then come back to them. In addition we recommend changing the length of time for an ‘engaged session’ from the default of 10 seconds to 60 seconds.

Photo by go_greener_oz