Ask most data scientists if data needs to be 100% accurate and they’ll tell you no, you simply need a representative sample.
Marketers feel differently: if they can’t trust the data they can’t take action on that data.
Since the perceived reliability of data is what dictates whether action will be taken (or not!) today we’re digging into six Google Analytics settings that will have a big impact on the accuracy of your web analytics data:
1. Default pages settings
If your visitor goes to website.com and website.com/index.html, will your website work? (Try it on your own website if you’re not sure and then come back. I’ll wait.)
Can you guess what that does to your data? That’s right, it splits it in two and makes it less reliable.
So, did it work? Yes? That means that people are landing on two versions of your homepage and, as a result, Google Analytics will have two entries for it.
Fear not though, all is not loss. The solution to this is to pick a default version and have Google Analytics merge the data into one entry. Here’s how:
all the default pages that your website works with (e.g. index.php, index.html, default.php, etc)
But that is not enough. Even with these settings on, look what happens when we visit these four pages:
- /my-test-page/ (visited twice)
- /my-test-page/index.html
- /my-test-page-two
- /my-test-page-two/index.html
We found the following entries in Google Analytics:
We expected only 2 entries: /my-test-page/index.html and /my-test-page-two/index.html
Why does this happen? Because Google Analytics will only merge data for URLs that have a forward slash or an extension (e.g. .html, .php) at the end.
And yes, in terms of best practice, a page should always have that forward slash or extension in place but that’s not always the reality.
Solution: The easiest way to stop your data from running amok is to ask your development or systems admin team to make sure that all URLs that don’t end with an extension instead automatically have the forward slash added to the end of the URL.
2. Subdomain tracking
Subdomains are being used more and more nowadays, especially for landing pages, blogs or micro-sites.
By default, under Content reports, Google Analytics only tracks the page path, without the domain.
Often we track them using the same tracking code as on our main domain and that in turn increases the inaccuracy of our home page data.
So, if we have 3 subdomains, e.g. mydomain.com, blog.mydomain.com and landingpage.mydomain.com, all the traffic from all three pages will be merged together like this:
If we add a secondary dimension you can see how that changes:
Solution: To fix this, add a filter to your Google Analytics view to change the default behavior and have it report the subdomains under content reports. Here is how the filter should look:
3. Too many events
There is a limit of 500 requests logged by Google Analytics per session with a maximum of 2 per second. That includes pages and events.
The chances are that people will not visit 500 pages on your website in one session but events are easily abused, especially by web apps when they try to track every click or mouse movements.
For example, there are even official tutorials online teaching you how to send an event every specific number of seconds in a bid to get more accurate bounce rate reporting.
Sending an event every few seconds or at every scroll or link click or every time a image is loaded means that, in just the first few minutes on a website, the limit is already reached.
After hitting that limit, the data is dumped by Google Analytics from that session so the user might end up converting and you’ll never know it.
Solution: There is no way to increase that limit so the solution is to make sure you don’t abuse event tracking. Only track what is going to make a difference to your business.
4. Global custom dimensions and metrics
Custom dimensions and metrics are a great feature because they allow for some customization freedom. However, it can be very easy to get inaccurate data from faulty implementations.
Here is how Google Analytics tells us we should set a custom metric:
ga('set', 'metric1', 1);
This code saves a property in the memory and whenever a request is made to Google Analytics, the property is attached. This also means that all the requests that are done after the property is set but before the page is reloaded will have the property attached.
Consider this scenario
We set a new custom metric: Dashboard View with the value of 1. We want to know how many times the dashboard is loaded. A user lands on the dashboard, which triggers a pageview with the custom metric attached. Google Analytics will report one dashboard view.
But then the same user drags and drops some elements on the dashboard which trigger an event request to Google Analytics as well.
The custom metric is again attached, which means Google Analytics will report two dashboard views, even if the user has loaded it just the one time.
Solution: Always attach the custom metrics or dimensions as properties of individual requests, not as global properties. Here is how it should be done for the above example:
ga('send', 'pageview', {
'metric1': 1
});
5. Tracking transactions
If you decide to use Google Analytics to track transactions for your web application you’ll have a hard time getting accurate data in your reports because most subscription renewals happen automatically and are not the action of a user. That means that they won’t be tracked by the Google Analytics default tracking technology.
If it is very important for you to have every single transaction recorded in Google Analytics you should consider doing server side tracking.
It works like this:
Step 1.
In your database, save the user id that Google Analytics has for the user that is doing the purchase.
Step 2.
When the purchase is completed, do a server side request using the measurement protocol and the user id you just saved. Do this for future subscription renewals or upgrades/downgrades as well.
Solution: Because the requests runs on your server and not in the browser of the user, it can’t be blocked by plugins, proxies and so on. It should provide 100% accuracy and when it does not happen you’ll have your own logs to find out what went wrong.
6. Non-cookie based tracking
Cookie based tracking is one of the weak leaks in data accuracy. If you have logged in users it is better to ditch cookies totally and instead use your own user ids for the tracking.
This fix will definitely have an impact on the number of unique users, new users and returning users reports.
The Google Analytics cookie is only used to store a user id. (This is valid only for Universal Analytics.) If a user changes browser, device, or simply deletes his cookies Google Analytics will see that as a new user entering the website or app.
Solution: To fix this Google Analytics allows for setting your own Google Analytics user id. That means that no cookies are used and if the user changes device or browser, as long as he is logged in, he is identified as being the same user.
Accuracy vs Accountability
While data scientist can work around accuracy, for everybody else it’s a matter of trust.
But trust can also be achieved with accountability. What users want is to understand how data is tracked and why the numbers are as they are. When that happens it’s much easier for them to navigate through data and decisions.
The easiest way to trust the data is to have access to raw web analytics data. This allows for doing some forensic research to get to the bottom of the missing data.
Services like Keen.io are designed to store raw data while services like our own, InnerTrends also offers an interface to navigate it.
Looking for deep insights into how your customers use your product?
InnerTrends can help. You won’t have to be a data scientist to discover the best growth opportunities for your business, our software will take care of that for you.
Schedule a Demo with us and witness with your own eyes just how powerful InnerTrends can be.