When building audiences and comparing datapoints, it’s important to keep an eye on the sample size. This is because larger sample sizes are more robust than smaller ones.
What’s the minimum sample size I can use?
There are no “golden” rules here because it depends on the data being considered and the size of the real-world population being represented. It’s also important to differentiate between the size of a sample that saw a question and the number of people within that sample who selected a particular option within it.
For example, if 1,000 respondents answer a question and only 50 select one of the options within it, it doesn’t mean that the result for this option isn’t robust. Rather, it suggests that it’s a low incidence behavior that isn’t particularly popular or established. The result in itself is robust, but it might not be advisable to undertake detailed profiling of the 50 respondents who selected this option.
When analyzing behaviors at the country level, most statisticians would likely agree that you want something approaching 1,000 as an overall sample size for robust results. When looking at audiences within a country, some would say 300 is a sensible minimum, with 100-300 being acceptable for certain sizing needs. We say 100 is a suitable minimum because many of our users like to build niche audiences. We would advocate using 300+ wherever possible, but anywhere between 100-1000 can produce results that are meaningful; the higher you go within this 100-1000 range, the more robust your results will be.
Why’s the sample size for my audience so small and how can I make it bigger?
If the sample size for your audience is too small, consider the following:
- How many waves do I have selected? Users often select the most recent wave and conduct their whole analysis using just one wave of data. However, in most cases, we’d recommend using a full year of data. For quarterly datasets like GWI Core and GWI USA, this means selecting the last four waves. When working with particularly niche audiences, using two or even three years of data is acceptable.
- Are the datapoints I’m using present across all selected waves? New questions may not yet have a full year of data behind them, making it harder to achieve a robust sample size. If this is the case, check the legacy folder to see if the question has a suitable predecessor that can be used alongside it.
- Are the datapoints I’m using present across all selected markets? Some questions aren’t asked in all markets. A simple remedy here can be excluding those markets from your analysis and selecting more waves. Alternatively, you can look for a similar relevant question that is asked in the missing markets and add it to your audience. For example, in GWI Core, although we don’t ask about automotive brands in Ghana, Kenya, Morocco and Nigeria, we do ask about vehicle fuel type. Therefore, “Tesla” could be combined with “Electric” to create an audience that can be used across all markets.
- Am I using AND and OR effectively? People don’t always need to meet both of the criteria you have joined by an AND to be relevant to your analysis; in such instances, consider using an OR instead. Alternatively, if you have multiple criteria joined by an AND, consider using the “at least” function to capture people who meet at least a set number of criteria instead.
- Have I tried using a combination of behavioral and attitudinal data points? Supplementing a behavioral datapoint with an attitudinal one can be an effective way of broadening your audience; for example, social media users who follow politicians OR social media users list politics as an interest.