Keeping your Google Analytics data clean

Keep it clean! How well is Google Analytics measuring your professional service firm’s web traffic?


When you report website metrics out of Google Analytics to a marketing committee or general manager at your firm how much confidence do you have in the data? Did you really experience a big lift in Users last month or was this based on bad data?

Here are seven things we do to keep your lawfirm or accountancy firm’s website analytics data clean, in order of importance.

1. Exclude internal staff from your website analytics

You’ve only got a few internal staff versus tens of thousands of visitors so that shouldn’t make much difference, right?

Internal staff can actually have a big impact on certain kinds of high value activity on your website that you are particularly interested in, like someone contacting a lawyer (your staff may use your website just like they use the internal phone directory).

In addition we’ve come across firms where IT has configured all web browsers to open by default on the firm’s home page – imagine what that does to figures like Bounce Rate (people who visit one page and leave).

Because of the above we’ve found firms where internal traffic is 10% or more of their visits.

At least set up a filter to exclude internal IP addresses (or if you’re a client we’ll do it) and remember to update the filter when you add a new office or change your network configuration or ISP.

2. Turn on GA’s new(ish) built-in robot filtering option

Unless you’ve been living under a rock you’ll be unsurprised to hear that not everything on the internet is what it seems to be.Excluding robots in Google Analytics

Google Analytics, because it is javascript-based, is designed to execute only for real visitors, people who are using a browser to look at web pages.

But even so there are automated bots out there that are designed to crawl sites and may imitate real people.

Google now includes a setting to filter out known spiders based on the IAB/ABC spiders and robots service and automatically filters out thousands of known IP addresses and browser types belonging to spiders that may impact your Google Analytics data (updated monthly).

Just click the checkbox.

In the first half of 2019 we reviewed data from two global law firms where literally 40% of their traffic was one particular very badly behaved bot.

3. Reduce (ghost) referral spam

Looked at your referring sites report lately? Many people do, which is why some enterprising spammers decided that if they inserted their website in your referral list you might visit it to see what the site was.

Even better, Google Analytics has an API so they could generate a random website universal analytics ID number to insert their own website as a referral to yours.

Some of the more common ones that we end up filtering out in client reports Removing fake referrals in Google Analyticsinclude sites like “event-tracking”, “burn-fat”, “sellingcrossing”, “monetizer”, “button” sites of various kinds, etc.

These fake referral visits have certain kinds of characteristics, they will often visit the home page, pages/session is usually just one page, and more usefully for our purposes they will not come in with your hostname (domain name) as they rely on a randomly generated GA id number.

So by creating a filter that requires the incoming browser to be requesting your hostname (part of your domain name) you can exclude a lot of these.

Be a little careful with this option however.

You may have subdomains or payment steps or translation tools or even legitimate 3rd party sites (like Mondaq) which have different hostnames. So check existing hostnames that appear in your GA reporting first (or talk to us about implementing this).

It is always a good idea to have Views in Google Analytics where you can test filter implementations – data once filtered out (incorrectly) is gone forever.

4. Recover some of the search keyphrases used to find pages on your site

If you review keyphrase information provided in Google Analytics in order to see what searches are leading people to your site you’ll likely be disappointed as most will be labelled “not provided”.

Only very large sites get much in the way of keyphrase information, although still less than 5%, since Google locked it down in 2013 (pre-2013 you will see much more information which is one of the reasons we suggest not creating new Google Analytics accounts when you do a site revamp).

You can use Google Search Console to see some information but equally importantly you can configure Google Analytics to store 100% of search phrases that are used in searching on your own website. 

The way you do this is by telling GA what the URL format for search results looks like.  For example on Magnifirm it is https://www.magnifirm.com/?s=google+analytics if you are doing a search for the phrase “google analytics“. And the  /?s= portion (the “query string”) will always followed by the term being searched for.

Configuring site search tracking in google analyticsThis url and query string format is likely what it will be for lawyer or accountant websites running on WordPress, but on some sites it may be something like lawfirm.com/search/google+analytics (do a search within your own site to find out what the url looks like).

Then go into the View settings in Admin in Google Analytics and put in the text in the Site Search Tracking portion. For example for most WordPress sites the adjacent setting will work.

Voila. 100% of your keyphrases for internal searches are now being tracked.

5. Set all URLs to lowercase

Google Analytics is case-sensitive. So a person requesting the web page mylawfirm.com/profile/partner/BillBloggs will be recorded separately fromAvoiding mixed case urls that are reported separately by Google Analytics the person requesting mylawfirm.com/profile/partner/billbloggs and the two will be reported separately by default causing you to produce erroneous (and untidy) reporting unless you manually consolidate.

Sometimes the different case variations will be caused by your web developers coding a particular kind of link with a different case, sometimes it will be a website user, and sometimes it will be your marketing team using Campaign Tagging.

All these situations create problems.

Fortunately you can fix this by setting all urls to lowercase in Admin in Google Analytics. Go to the Filters section and set it up as shown (if you’re a client of ours for say our GA+ reporting platform we do it for you as part of our clean data initiative).

6. The GDPR bogeyman and cookies

One way to have really clean Google Analytics data is to stop gathering it, because, for example, you think cookies are personal data, and have been scared witless by European data privacy law.

So very occasionally we see firms who disable Google Analytics cookies for website visitors by default (sometimes advised by over-zealous website developers) which cripples your firm’s single largest source of market intelligence.

We check both the Data Retention period (which for professional service firms can normally be lengthened from the default) and also how cookie warnings are implemented, if you have them (implementing cookie warnings inappropriately can result in losing information about where visitors come from).

Read about why Google Analytics data is not personal in the GDPR sense of the term, and this to understand cookie policies across global lawfirms we analysed.

7. Not all Direct or Referral traffic is what it says on the tin

When reporting particular kinds of traffic to your law firm marketing committee be aware of the definitions of traffic.

A common problem we see is law firms sending out newsletters without setting them up to be tracked in Google Analytics. Yes, you may be using MailChimp (or the equivalent) and be able to track clicks in that system but you also want to be able to track clicks in Google Analytics because it has a more all encompassing view of the world than MailChimp. For example a good GA reporting suite will help you identify not just the first click from an email but also where the reader has forwarded that web page of yours (originally sent out via your newsletter) to someone else.

Poor newsletter setup by contrast will see all your newsletter traffic show up as unattributed Direct traffic in Google Analytics.

And poor Google Analytics setup will also see your blog subsite or client extranet showing up as external Referral traffic when you may not wish it to for reporting purposes.

It’s beyond the scope of this article to cover the change to your marketing internal process to correctly label and distinguish newsletters or merge subsites into your GA results – but it should be done.

Photo by go_greener_oz