How to Ensure Representativeness of Samples in Customer Research

How to Ensure Representativeness of Samples in Customer Research

An important part of every customer research program is a representative sample. Often this is oversimplified into a focus on response rates or quota targets, but this can be misleading in isolation. To understand why, we need to look more closely at what it means for a sample to be representative.

Total Survey Error

A helpful way to frame the idea of representation is through the theory of Total Survey Error (see Groves et. al 2009) as it conceptualizes a survey and the errors that can occur. According to the theory, errors introduced into a survey can either be caused by variance (which refers to a random deviation from the true values) or by bias (which refers to a systematic deviation from the true value). A sample can be said to be fully representative if it is free from bias and the more bias that exists, the less representative the sample is. Common statistical approaches can accurately predict variance in the form of confidence intervals, but bias is often less predictable and we can only attempt to identify and mitigate each potential source of bias on a case by case basis.

So what does this mean in the context of customer research?

 

Further Reading
AI in Market Research: The Challenges and Limitations of Synthetic Data

 

Sample Error

Calculations used to estimate variance assume the sample selected to be random, which means without any systematic deviations. Sometimes however, sytematic deviations can arise in the sample and these are called bias. These bias can emerge from a number of sources but when considering customer research, the two principle concerns are under/over coverage and non-response bias.

Under/Over Coverage

One important source of bias is the sample frame (the list from which the sample respondents are recruited) that is provided. In customer research, the universe (the total population you are trying to learn about) is usually all of the customers that have used or bought your services/products within a certain time period. In a perfect setting the sample frame would cover exactly this universe. However, there might be certain people that are missing from this list (this is called under coverage) or are on the list but don’t belong there (this is called over coverage).

Common examples include where contacts have switched companies and the email address you had for the company therefore becomes invalid, or very new customers who have not made it into your system yet. This can result in repeat customers or those with greater direct sales contact being more likely to have correctly listed contact details in the sample frame and therefore introduce a bias towards these types of respondent in our sample. This can be a problem when these sorts of customers also tend to be more satisfied as it skews our results towards their more positive experience and could hide underlying issues experienced by the under-covered group.

How can you tackle this bias? Firstly, you should ensure you have the most accurate customer lists possible, which will not only benefit customer research but also all other marketing activity and internal company processes. This means maintaining the customer lists and keeping all relevant information up-to-date, including removing or replacing clients/email adresses that have become irrelevant. The second step is the selection of customers that participate in the research. As mentioned earlier, it might be tempting to only survey a subset of customers that are expected to be satisfied, however, this will not provide an accurate and representative picture of the customer base and will eventually backfire as it hides the dissatisfied. An easy answer for who to select for a customer research study is severyone. If this is not possible/reasonable, there should only be random samples drawn to ensure the subset of customers is representative of all clients.

 

Further Reading
Going for Gold: A Lesson in Poor Survey Design

 

Non-Response Bias

Another important source of error that we have less control over is the non-response bias. This refers to the bias that is caused by those people who made it into the random sample but did not respond to the survey. As long as the total sample size is large enough this doesn’t immediately become an issue, but under some circumstances certain types of customer will be more likely to respond to the survey invitation than others. This introduces a systematic bias which we refer to as a non-response bias.

Reasons for non-response in customer research are various. Some of these reasons we have control over, such as keeping the survey design interesting, ensuring there is enough time to finish the survey, and ensuring privacy. Others we have less control over, such as survey fatigue for respondents, time constraints for busy respondents, or respondents that spend less time on a desk and are therefore less likely to respond. It is important to try and prevent possible non-response by informing customers prior to the survey, giving multiple chances to reply to the survey, ensuring an interesting survey design, keeping the survey as short as possible, and making it as easy as possible for the clients to respond.

But what about the reasons that we do not have control over? As a certain non-response share is inevitable, a prominent measure has been established to get an understanding of the non-response bias that results from non-response to surveys; the response rate.

High response rates are perceived to be positive, while low response rates are negative. Although there is a certain connection, research shows that high response rates do not necessarily mean more accurate data, only a lower risk of non-response bias (see Groves and Peytcheva 2008). This is mainly due to not knowing what causes the non-response to the survey.

There are various different approaches suggested by research on how to handle this situation (see Wagner 2012), but the one most applicable to customer experience studies is cross-checking with frame data. Although this is not a panacea, it brings some additional information on non-respondents and can increase confidence in the representativeness of the sample.

In this context, frame data is information that is available to you about your customers; commonly, there might be information available like job role, country, and gender, but this can be anything at all. Once the survey is conducted, we can compare the true values (the distributions of those variables in the entire customer list) with the ones from the survey.

If we find a close match between the customer research sample and the entire customer list, the risk of non-response bias decreases. If we find differences we can take steps to mitigate this (seeking additional sample, weighting responses etc.). The limitation is that this approach needs reliable information from the customer list. This kind of information is not always fully available or accurate and there may only be a weak connection between the frame variables and key metrics in the survey. Despite this, it can provide an additional safeguard against bias and so should be used wherever possible.

 

Further Reading
The Case for Custom Research: The Pitfalls of Standardization and the Quest for Uniqueness

 

Takeaways

Representativeness is a more complex challenge than most people realize. While it may be impossible to ensure a fully representative sample, we should make every effort to identify potential sources of bias and correct for them where possible. The first step towards this is having a comprehensive, accurate, and up-to-date sample frame. This adds some more context to our data and is a useful substitute for response rates to judge the risk of non-response bias.

 

References

Groves, Robert M. and Emilia Peytcheva (2008). The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis. Public Opinion Quarterly.72 (2).167–189. https://doi. org/10.1093/poq/nfn011.

Groves, Robert M., Floyd J Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer und Roger Tourangeau (2009). Survey Methodology. Edition 2. Hoboken, New Jersey: Wiley.

Wagner, James (2012). A comparison of alternative indicators for the risk of nonresponse bias. The Public Opinion Quarterly.76 (3). 555-575. DOI: 10.1093/poq/nfs032.

 

 

 

To discuss how our tailored insights programs can help solve your specific business challenges, get in touch and one of the team will be happy to help.

Show me: