Hierarchical cluster analysis is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each of the other clusters, and the objects within each cluster are broadly similar to each other. This article describes how to conduct a hierarchical cluster analysis in Displayr.
Please note, Displayr's hierarchical cluster analysis tool treats the variables as the cases, so it does not produce segments in the traditional sense (e.g., it is used for creating segments of brands, rather than segments of people). If you want to group similar respondents together, consider an alternative method such as Latent Class Analysis, or k-means cluster analysis.
- Hierarchical clustering can be performed with either raw data or a distance matrix. When raw data is used, the distance matrix is automatically computed in the background.
1. From the toolbar menu, select Anything > Advanced Analysis > Cluster > Hierarchical cluster analysis.
2. From the object inspector on the right, select the variables from your data set that you want to use as inputs to the cluster analysis. For this example, we've used binary variables showing device ownership from a technology survey.
3. Enter a value for the Number of clusters that you want to create.
4. OPTIONAL: Select a distance measure from the Distance input. Euclidean is selected by default. For more information, see the dist package documentation which is used for the distance matrix computation.
5. OPTIONAL: Select the algorithm to use to form the clusters from the Clustering method input. The Ward2 algorithm is selected by default. For more details, see the hclust package documentation.
6. Click the Calculate button to generate the custom analysis output.
The output is what's called a dendrogram which shows the distance between the variables. Each of the clusters is displayed as a separate color.