How to Do Hierarchical Cluster Analysis in Displayr

Hierarchical cluster analysis is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each of the other clusters, and the objects within each cluster are broadly similar to each other. This article describes how to conduct a hierarchical cluster analysis in Displayr.

Please note, Displayr's hierarchical cluster analysis tool treats the variables as the cases, so it does not produce segments in the traditional sense (e.g., it is used for creating segments of brands, rather than segments of people). If you want to group similar respondents together, consider an alternative method such as Latent Class Analysis or k-means cluster analysis.

Requirements

Hierarchical clustering can be performed with either raw data or a distance matrix. When raw data is used, the distance matrix is automatically computed in the background.

Method

From the toolbar, select Anything > Advanced Analysis > Cluster > Hierarchical Cluster Analysis or in the Report tree select +> Advanced Analysis > Cluster > Hierarchical Cluster Analysis.
From the object inspector Select the variables from your data set that you want to use as inputs to the cluster analysis. For this example, we've used binary variables showing device ownership from a technology survey.
Enter a value for the Number of clusters that you want to create.
OPTIONAL: Select a distance measure from the Distance input. This is the formula used to compute the distance between points before clustering. For more information, see the dist package documentation, which is used for the distance matrix computation.
- Euclidean (default)
- Maximum
- Manhattan
- Canberra
- Binary
OPTIONAL: Select the algorithm to use to form the clusters from the Clustering method input. For more details, see the hclust package documentation.
- Ward1 (ward.D)
- Ward2 (ward.D2) (default) - Commonly known as Ward's method
- Single
- Complete
- Average
- McQuitty
- Median
- Centroid
OPTIONAL: Tick Variable names. This Displays Variable Names in the output.
OPTIONAL: Tick Categorical as binary. This represents unordered categorical variables as binary variables. Otherwise, they are represented as sequential integers (i.e., 1 for the first category, 2 for the second, etc.). Numeric - Multi variables are treated according to their numeric values and not converted to binary.
OPTIONAL: Set the Label margin, which is the width of the right-hand margin to accommodate long labels.
Click the Calculate button to generate the custom analysis output.

The output is what's called a dendrogram, which shows the distance between the variables. Each cluster is displayed in a separate color.

Acknowledgements

The R package networkD3 is used to create the dendrogram, while hierarchical clustering is performed by the hclust function in the stats R package.

Please see What is Hierarchical Clustering?, What is Dendrogram? , and What are the Strengths and Weaknesses of Hierarchical Clustering? for more information on hierarchical clustering and dendrograms.

How to Analyze Data by Groups/Segments

How to Do Latent Class Analysis

How to Create a Segmentation Comparison Table

How to Do Mixed Mode Cluster Analysis in Displayr

How to Do K-Means Cluster Analysis

How to Save K-Means Cluster Membership