How to improve the accuracy of your Google Analytics 4 Data – while also working around common Sampling and Threshold issues:
Reading time: 10 min
First, let’s outline why accurate data is so important
- It allows you to identify patterns in consumer behavior and how people choose to interact with your business.
- Advanced product analytics skills.
- Accurate data is essential for A/B testing. It ensures reliable results, prevents wasted resources, and maintains confidence in data-driven decisions.
More importantly, the bigger your business grows – the bigger these opportunities become. Conversely, this also means that the larger your store, the more likely you are to suffer from significant financial losses because of inaccurate reporting.
So, how can we avoid these issues and begin harvesting data that’s more accurate, so that we can make more precise — and financially astute — data-informed-decisions?
The first option would be to pay Google — for a small expense of $150,000/yr, you may remove many sampling and threshold limitations from your Google Analytics account, and begin to access your ‘true data’ whenever and however you want.
But let’s say you’d rather keep that $150,000/yr in your pocket…are there any cheaper workarounds?


Let’s start by talking about sampling:
Sampling in Google Analytics 4 (GA4) reports is a process in which Google’s analytic system uses only a portion of the data to generate reports instead of analyzing all available data. This is done to reduce the volume of data that needs to be processed and analyzed, which reduces the load on Google Analytics servers and allows for faster report generation.
If you’ve ever seen red or orange alerts like these:

...then you have a sampling issue.
Which honestly, isn’t the worst thing in the world. It means your business is so big and successful that Google is extracting data only from a specific sample or group of users, rather than from the entire user stream. Generally, when this occurs, it’s due to a large volume of data or a high level of complexity in the query, which cannot be processed quickly enough in real-time.
Some effective workarounds to sampling are:
Some effective workarounds to sampling are:
-
Reduce the reporting period:
Sampling often occurs if you attempt to view a larger time interval. If possible, shorten the reporting period to a smaller interval.
-
Limit the number of dimensions and metrics:
Including a larger number of dimensions or metrics in the query can lead to sampling. Use only the ones that are truly necessary for analysis.
-
Divide the query into smaller ones:
If the report contains a lot of data, divide it into smaller queries (by sectioning according to time/metrics/dimensions) to reduce the likelihood of sampling.
-
Finally, you can take a more extensive approach where you connect Google Analytics 4 with BigQuery and manually retrieve all of the data in its pure form.
Update: It’s no longer manual. There’s now a direct integration between Google Analytics 4 and BigQuery. (Hoorary!)
What about Thresholding?
Thresholding often occurs in situations like this:

That is, data sampling was avoided in this example, but the thresholding is still applied to the report.
But why does this occur?
Firstly, thresholding only appears when you have Google Signals enabled on your Google Analytics 4 account.
You can find this setting here:

By reviewing the text in this section, you can understand why it is useful.
Thresholding gives your Google Analytics account deeper insights into who your users truly are – their habits, history, and interests all get tracked and squished down into data you can use – creepy? Yes. But very effective.
Additionally, Google Signals allows you to track your users across different devices and platforms. Essentially, whenever someone is signed into their Google account, their data will be collected and put to use, allowing for a deeper analysis into demographic information, interests, and other specific characteristics related to your audience.
Ultimately, keeping Google Signals on will allow your GA4 account to operate more effectively and access more features, because it lets you:
- Populate demographic data into GA4.
- Reuse Google Analytics Audiences as retargeting audiences for Google Ads (thus allowing you to create a more intricate and precise funnel).
How does this cause thresholding issues?
Well, it’s pretty simple. As creepy and invasive as Google can be, it can’t allow you to get too specific. Because of that, Google applies a threshold to data so that we (us GA users) won’t be able to identify individual users based on the data that Google Signals creates for our reports (e,g age, gender, interests, etc).
Is it a necessary failsafe for protecting individual privacy? Debatably. But in my opinion, it’s a slightly excessive limitation, but Google seems to think otherwise and because of that Thresholding will be here to stay – likely now, and forever.
Now, why is that a problem for us?
- It means that we’ll be hard pressed to generate a data report with less than 50 users. If your parameters are specific enough that they’ll likely generate a report of only 50 or so people, then you’ll run into thresholding issues. (50 is not an exact number, but it’s a number we’ve repeatedly had crop-up as ‘the threshold’).
- It means there’s a hard limit to how ‘deep’ you can dig. Sometimes a deep understanding of a select few very avid ‘perfect fit’ customers will be worth more than a general understanding of 1000s of other customers, but GA limits how focused you can get.
How can we avoid thresholding issues in GA4?
How can we avoid thresholding issues in GA4?
Well, the answer most people will tell you is that you can just disable Google Signals.
However, this would prevent you from feeding your Google Ads with more detailed data about your visitors. Additionally, if you’ve enabled Google Signals in the past and now disabled it, then you’ll be out of luck. Disabling it now will only affect Data collected from that moment on, not data from back when it was enabled.
So, what’s the real solution?


You can use Reporting Identity to bypass Thresholding:
You see, in GA4 there’s a section called Reporting Identity. It’s a neat little feature that influences how GA counts the users on your website or app.
You can change it by going to Admin > Reporting Identity. You might think there are two options here, but pay attention, as there are actually three.
Here's what each option means:
-
Blended:
This is the most advanced option. It includes all previous identity methods, and it uses machine learning to fill in the gaps and model data. However, you need to implement Google consent mode to unlock this feature.
-
Observed:
This option is a bit more advanced. It uses cookie data, Google Signals data (if enabled), and user ID (if you are tracking that too). Information like user ID or Google Signals data helps GA deduplicate certain users and understand that a person using multiple devices might still be the same individual.
-
Device-based:
This is the most basic option. It will use just the Device ID (a.k.a. first-party cookie). If the same user accesses your website from multiple browsers/devices, GA will treat them as separate users.
If you switch to Device-based, Google Signals will not be used for counting users, and the thresholding issue will disappear.
However, keep in mind that there’s a couple caveats:
- When you select Device-based, certain user identifiers won't be considered when calculating your reports, resulting in less accurate user counts. This is because in this mode, a user accessing your website from three devices will be seen as three different users.
- Additionally, sometimes even after switching to this mode, the warning may still appear in your reports. In that case, simply refresh the page (CTRL + F5 in Windows), and the problem should vanish.
- You’re going to want to return to Blended mode after reviewing the reports. This is simply good practice, as other users may not be aware of the changes you’ve made, and without context, navigating the adjusted data reports can be difficult.
Conclusion:
To avoid thresholding issues in the future, do not enable Google Signals (unless you plan to use the remarketing or demographic reports features in GA). Additionally, there are alternative ways to pass audience data through Google Tag Manager. If you have already done that, you can change the reporting identity to Device-based at any time, and you can freely switch between the options.
This parameter does not affect the data you have collected; it only influences the calculation method.