Market research is often used to gain an understanding of the underlying drivers of a decision or outcome, and the characteristics that define a customer profile or segment (group). Regression and discriminant analysis are statistical techniques used for predicting which audience a customer falls into, which characteristics have the strongest impact on allocating a customer into a particular segment, and for predicting behaviours and preferences (such as which product an individual would be likely to buy).
John, a marketing manager at a large multinational company, is interested in finding out what would most influence target customers to purchase his company’s products.
Paul, a sales director at a global manufacturing company, has conducted an evaluation of his market and has identified that there are four types of customer in terms of who they are, how they behave and what they need. Now he needs to translate this information into something actionable. How can his salesforce identify which segment each current and prospective customer falls into?
Regression and discriminant analysis will solve John and Paul’s problems. Both techniques predict outcomes and classify people or companies into different segments or categories, but are based on different statistical principles and assumptions.
Regression versus discriminant analysis
Linear regression – examines the relationship between an independent variable (e.g. length of relationship) and a dependent variable at the core of the research objective (e.g. overall satisfaction). Multivariate linear regression is used when there are multiple independent variables (e.g. length of relationship + product purchased + sales territory) in addition to the dependent variable. Linear regression is useful for identifying actions required to improve the outcome assessed with the dependent variable (in this case, overall satisfaction).
Logistic regression – examines the probability of an outcome based on the goodness of fit between various factors (independent variables) and the outcome (the dependent variable). Unlike linear regression where the dependent variable is typically measurable on a continuum (e.g. a customer satisfaction scale, a price range, etc.), the dependent variable in logistic regression is stochastic (i.e. randomly determined). Logistic regression is based on:
- A dichotomous response i.e. a binary dependent variable with only two possible outcomes, e.g. Yes/No on usage (this is known as binary logistic regression); or
- Multiple responses (more than two) which predict, for example, which segment a customer falls into or which product they intend to purchase (this is known as multinomial logistic regression
Discriminant analysis – determines the relationship between different independent variables and the dependent variable to predict an outcome. The dependent variable is categorical in nature, such as a segment, as opposed to a continuous variable as with linear regression. Analysis of the independent variables leads to the computation of coefficients (weighting factors) which are used to develop a decision rule at the heart of the model. Examples include an allocation algorithm for determining which segment an individual or company falls into, a tool for determining which product a company would purchase, etc.
Note that segmentation is arrived at through different statistical techniques such as cluster analysis. Logistic regression and discriminant analysis are thus used for predicting segment allocation only when the segmentation has been identified a priori.
In the case of a segmentation, the allocation algorithm consists of simple “killer” questions that could be applied passively (to a customer database) or actively (asked directly to someone) to allocate the individual or/and company to a specific group / segment. Similar tools can be created for other purposes beyond segmentation, such as for determining which new products customers can be expected to purchase.
Outputs from regression and discriminant analysis
Which factors differentiate one group from another, e.g. how John can determine which companies are most likely to purchase his products.
How to classify respondents into different groups e.g. how Paul can segment his customer base into different buyer and user types through an actionable tool.
The example below shows the impact of various factors on intent to purchase products sold by John’s company. Regression analysis was carried out for this goal and the figures shown are regression coefficients (weighting factors). The higher the regression coefficient, the stronger effect it has on intent to purchase. For example, if satisfaction with being an innovative company was increased by a factor of 1 (e.g. an improvement of 7 out of 10 to 8 out of 10 in satisfaction), then the likelihood to purchase would increase by a factor of 0.84. John and his company should therefore focus on innovation and complaint resolution as these will have the biggest impact on purchase intent.