How to Do Latent Class Analysis

Latent Class is a statistical technique for grouping together similar observations (i.e., creating segments). This article describes how to create a tree and "Latent Class Analysis" variable using the Latent Class Analysis feature to do Segmentation based on a series of variable sets in Displayr. There are also some technical details on how to interpret a Latent Class Analysis Report at the bottom of this article. Note that if you are looking to perform Latent Class Analysis on MaxDiff data, see Latent Class Analysis on MaxDiff data in our technical resources, as this is a separate feature.

Requirements

A data set containing the variables that you want to use as inputs to the cluster analysis segmentation.

Method

1. From the toolbar menu, select Anything > Advanced Analysis > Cluster > Latent Class Analysis or in the Report tree, click +> Advanced Analysis > Cluster > Latent Class Analysis.

2. On the next screen, select the variables that you want to include as inputs to the Latent Class Analysis from the Available data list. The selected variables will be displayed on the Data to display list. For example, I've used a data set containing statements on a 5-point agree/disagree scale about attitudes toward mobile technology.

3. Select the number of segments you want to create:

Select Work out number of groups automatically if you want Displayr to determine the number of groups with the greatest differences using the Bayesian Information Criterion (BIC), or
Select Specify the number of groups and enter a value for the number of segments you want to create.

For this example, I've selected the latter option and entered a value of 4.

4. OPTIONAL: Apply a filter if you want to create a segmentation for a specific subgroup.

5. OPTIONAL: Select a weight if you want the input variables weighted.

6. Click the Create Latent Class Analysis button.

The Latent Class tree output and "Latent Class Analysis" variable will then be generated. The first column in the tree shows the distribution of responses for the sample entered into the analysis. Each additional column shows the response distributions for each segment.

The "Latent Class Analysis" variable will be created to assign each respondent to one of the classes or segments. A new single response variable is added to the bottom of the data set called "Latent Class Analysis" with a date/time stamp in the variable label.

Screenshot 2024-05-28 091434.png

You can create a table using this variable to see the distribution of segment assignments (note the counts may not match the sample size shown in the tree, which is an approximation).

7. To get a diagnostic report of your latent class analysis, go to Anything > Advanced Analysis > Cluster > Diagnostic > Analysis Report in the toolbar, or in the Report tree, select +> Advanced Analysis > Cluster > Diagnostic > Analysis Report.

The results are as follows:

Technical Details

To better understand how to interpret the output of the LCA, please see our technical documentation here. Also, keep in mind that the estimated size of each segment as a percentage and in terms of the number of respondents is shown at the bottom of each segment. It is commonplace that when crosstabs are created using the segmentation variable that the segment sizes will differ from the numbers shown in these boxes (although the differences are typically small). This is because the segment sizes that are shown on the tree are estimates, where the estimates are constructed under the assumption that there is uncertainty (e.g., a person may have a 33% chance of being in one segment, a 66% chance of being in another, and a 1% chance of being in a third). By contrast, when the segments variable is selected in crosstabs, the assumption is that each person must be in one and only one segment, and the difference between these assumptions causes differences in results. When a weight is used, the total population size (as shown in the Population in the top node) is the Effective Sample Size for the sample that has been used for the segmentation.

Grid questions need a little more treatment before using them in your Latent Class Analysis (LCA). Either stack the data file, or change the Question Type to Binary-Multi or Numeric - Multi.

Also, due to the nature of the algorithm used in latent class analysis and differences between software, you will not be able to replicate a latent class analysis from a different software in Displayr, see What to Do When Displayr's Results Are Different Than Another Program's Results for more details.

How to do MaxDiff Latent Class Analysis

How to Use Hierarchical Bayes for MaxDiff

How to Create MaxDiff Model Ensembles

How to Create a MaxDiff Model Comparison Table

How to Create a MaxDiff Experimental Design

How to Save Respondent-Level Preference Shares from a MaxDiff Latent Class Analysis

How to Convert Alchemer MaxDiff Data for Analysis in Displayr

How to Create Diagnostic Reports for Latent Class, Mixed Mode Trees, and Mixed Mode Cluster Analysis