There are many reasons why our data may differ from other sources, and some of the most common outlined below. However, the fact that different approaches yield different results shouldn’t be a cause for concern. Instead, you should always consider exactly what’s being measured, among whom and how, when comparing different sources.
Subjective answers
Because survey data is self-reported by real-life individuals, answers can be subjective. For example, there’s a big difference between asking a factual question such as “do you own a smartphone”, and a more subjective one such as “are you a soccer fan”. For the latter, respondent interpretation plays a big role; an individual may watch soccer games, but not consider themselves a “fan” - where this line is drawn is up to the respondent.
Question wording
Question wording can have a big impact on results. For instance, it’s common for surveys to track brand “engagement”, but there are multiple ways this can be phrased; you could ask respondents whether they buy a particular brand, or you could ask them whether they use it. Both approaches will give different results, as the purchaser and end user often aren’t the same.
Similarly, rather than asking about either of these things in the present tense, you could ask about them in the past tense (e.g. “have you used this brand in the last month?”). This could give different results yet again; a respondent may have used a brand in the last month, but this may have been a one off. They may not therefore consider themselves a “user” (present tense) of that brand.
Question format
The format of a question can influence results. Typically, questions that require a response for each option (such as “grids”) produce higher results but are worse for the respondent; because they have to provide an answer for every single option, it tends to drive up positive responses as respondents tend to err on the side of positivity.
On the other hand, formats that don’t require a response for each option (such as multi-select lists) tend to produce lower results but paint a clearer picture of where respondents’ true interests lie; as they’re not forced to enter something for every option, they select those things which are most important to them.
Universe
It’s important to consider exactly who is being represented by a data set before comparing it to another source. For example, GWI Core is representative of internet users aged 16-64 in each market, but some data sets will be designed to be nationally representative (that is, representative of everyone, not just internet users).
It’s particularly important to keep this in mind when looking at markets with low internet penetration rates, as the online population will typically be more affluent, educated and urban, and younger, than the national average. This means some digital behaviors can appear more widespread, as only the most connected individuals are being represented.
Additionally, some data sets may not have an age cap (such as GWI USA). In such data sets, the percentage of people engaging with certain digital behaviors will often be lower, as older respondents tend to be less digitally engaged than younger ones.
India and China
As fast-growth markets with rapidly growing internet penetration rates, India and China account for a progressively bigger share of the global online population each year. Because of this, a change in behavior from just a small percentage of people in either of these markets can have a noticeable impact on global figures.
Additionally, due to their size, unique approaches to data collection are often used in these markets; for instance, some sources only represent the largest cities. It’s therefore important to consider whether these markets are represented by a given data set, and to what extent, when making comparisons.
Time frame
When comparing different sources, it’s important to consider when the data was collected (i.e. how recent it is) and what time frame is being represented. For instance, are you comparing a source that captured monthly social media usage to one that measured weekly usage? If so, you’d expect the latter to return lower results, but keep in mind that this will also be influenced by how “usage” is captured and defined.
Active vs passive data
Active data requires input from the individual(s) being measured (e.g. survey data) while passive data doesn’t (e.g. cookie data). Both have their strengths and limitations. For example, using a survey to ask someone how many times a day they visit a website will give you their best estimate within a particular range, but not an exact figure. A cookie on the other hand could tell you exactly how many times a day they access that website (assuming the cookie isn’t deleted, and that nobody else is using the same browser).
However, passive approaches can struggle to account for:
- The sharing of accounts/browsers/devices - e.g. multiple people can appear as a single user if they access the same social media account, or share a family computer
- Individual usage across multiple accounts/browsers/devices - e.g. a single person can appear as multiple users if they have both a personal and business social media account, or access the same website via multiple unconnected devices
- VPN usage - e.g. individuals can be attributed to the wrong country/region if they access a platform of website using a VPN, which they may do for work reasons, or to bypass local restrictions
It’s therefore important to remember that GWI surveys measure people, not devices.
Defining “monthly active users”
It’s common for social networks to report on the number of “monthly active users”. However, the definition of this term varies; some social networks require users to follow a minimum number of accounts for them to be considered “active”, while others require them to have actively engaged with content within the specified period.
Our approach to tracking social media usage is more open; because we actively ask respondents about their online behavior, we’re able to capture users who visit social networks but don’t have their own account (e.g. those who read threads but don’t participate in them), as well as those who share accounts and access via VPNs.