Sometimes you may wish to create segments of your respondents and use these segments to classify respondents in a different survey or later wave of your tracker. You can, in essence, reuse your original segmentation model to classify respondents in new data to those segments. There are two different ways to approach this:
- Assign respondents to segments in the new data file using the same variables as used when forming the segments, or,
- Predict segment membership based on a different set of variables.
Requirements
- A document with one of the following types of segmentation models:
- Latent Class Analysis
- Trees
- k-means
- Mixture Models for Regression
- Most machine learning models - such as Random Forest
- A new data set with variables that correspond (whether they are named the same or not) to the original variables used in the predictive model that will predict using the new data.
- Each variable in the new data must have the same code frame as the original data. All the same categories are present, and their underlying coded values are the same (i.e. Category 1 is given a value of 1 in the original data and a value of 1 in the new data).
- If using a non-regression type of algorithm (trees, kmeans, etc), there are additional requirements:
- because there's always a random element in the algorithm, the order of the categories of the variables used must be exactly the same. That is, when you make a summary table for each variable the placement of the rows needs to be exactly the same as they were when the tree was created.
- There must be at least 1 respondent in each category that was included before. That is, if you use a variable set with a "Don't know" category, for example, where there was 1 respondent who selected it in the original data set, you also need to have at least 1 respondent for that category in the new data set.
Method 1 - Using the same variables
If the variables in your new data file have the exact same names, you can use one of the methods below. If your variables have different names, skip to Method 2. Note other important requirements above.
Segments formed using latent class analysis or built-in segmentation modeling
A three-segment latent class solution, based on a sample size of 400, is shown below. To allocate people in a new data file using these segments:
- Update your document with the new data file.
- Go to the latent class output in your document, which will have an error:
To keep the same segmentation that you initially created do not regrow the tree. The segmentation variable that is created in your data file will now be applying the previous segmentation to the additional data. - The variable in the project that shows segment membership has now automatically updated, allocating people in the new data file to the segments.
Segments formed using k-means or another R-based segmentation modeling tool
A three-cluster k-means solution is shown below. To allocate people in a new data file using these segments:
- Click on the k-means solution and make sure that Calculate automatically is not checked (this option is at the top of the object inspector.
- Take a copy of the line of code that looks similar to this (with different variable names). To view the code of the output go to Data > Show Advanced Options > R Code > Edit R Code
kmeans = KMeans(data.frame(understand, shop, key, value, interested),
- Select the data set in the Data Sources tree.
- Press Update in the Object Inspector and select the new data file.
- From the Data Sources tree, hover and select + > Custom Code > R - Numeric.
- In the R Code box at the top of the window, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are kmeans or whatever it has been changed to and the variable names):
predict(kmeans, newdata = data.frame(understand, shop, key, value, interested))
- Give the variable an appropriate Name and Label.
- Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Data > Properties).
- Press Labels (in the Properties section of the object inspector) and enter any labels you desire and press OK.
Method 2 - Using different variables
Many times you may not have all the same questions used to segment respondents in a survey that you want to segment. In this case, you can use a predictive model to predict segment membership (after it's created using one of the original segmentation models). Instead of including all of the original variables used to create the segmentation as predictors you can either include:
- A completely different set of variables (e.g., demographics, or some other data available in a customer database).
- A subset of the variables used to create the segments. (Tip: if you are building a predictive model based on exactly the same variables as used to create segments, you are making a mistake, and should instead use the approach described in the previous section).
The output above from a multinomial logit (MNL) model (+ > Advanced Analysis > Regression > Multinomial Logit), predicting segment membership based on firmographics. The goal is to now predict segment membership in a new data file, that contains the same predictor variables.
- Click on the model output and make sure that Calculate automatically is not checked (this option is at the top of the object inspector).
- Take a copy of the line of code that looks similar to this (with different variable names). To view the code of the output, go to Data > Show Advanced Options > R Code > Edit R Code
glm = Regression(segmentsGXVYS ~ q1 + q2 + q3 + q4 + q5,
- Click on the data set in the Data Sources tree.
- Press Update in the Object Inspector and select the new data file.
- From the Data Sets tree, hover and select + > Custom Code > R - Numeric.
- In the R Code box at the top of the window, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are glm or whatever it has been changed to and the variable names):
predict(glm, newdata = data.frame(q1, q2, q3, q4, q5))
- Give the variable an appropriate Name and Label.
- Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Data > Properties).
- Press Labels (in the Properties section of the object inspector), enter any labels you desire, and press OK.