Hierarchical cluster analysis is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each of the other clusters, and the objects within each cluster are broadly similar to each other. This article describes how to conduct a hierarchical cluster analysis in Displayr.
Please note, Displayr's hierarchical cluster analysis tool treats the variables as the cases, so it does not produce segments in the traditional sense (e.g., it is used for creating segments of brands, rather than segments of people). If you want to group similar respondents together, consider an alternative method such as Latent Class Analysis or k-means cluster analysis.
Requirements
- Hierarchical clustering can be performed with either raw data or a distance matrix. When raw data is used, the distance matrix is automatically computed in the background.
Please note these steps require a Displayr license.
Method
- From the toolbar, select Anything > Advanced Analysis > Cluster > Hierarchical Cluster Analysis.
- From the object inspector, select the variables from your data set that you want to use as inputs to the cluster analysis. For this example, we've used binary variables showing device ownership from a technology survey.
- Enter a value for the Number of clusters that you want to create.
- OPTIONAL: Select a distance measure from the Distance input. This is the formula used to compute the distance between points, prior to clustering. For more information, see the dist package documentation which is used for the distance matrix computation.
- Euclidean (default)
- Maximum
- Manhattan
- Canberra
- Binary
- OPTIONAL: Select the algorithm to use to form the clusters from the Clustering method input. For more details, see the hclust package documentation.
- Ward1 (ward.D)
- Ward2 (ward.D2) (default) - Commonly know as Ward's method
- Single
- Complete
- Average
- McQuitty
- Median
- Centroid
- OPTIONAL: Tick Variable names. This Displays Variable Names in the output.
-
OPTIONAL: Tick Categorical as binary. This represents unordered categorical variables as binary variables. Otherwise, they are represented as sequential integers (i.e., 1 for the first category, 2 for the second, etc.). Numeric - Multi variables are treated according to their numeric values and not converted to binary.
-
OPTIONAL: Set the Label margin, which is the width of the right-hand margin to accommodate long labels.
- Click the Calculate button to generate the custom analysis output.
The output is what's called a dendrogram which shows the distance between the variables. Each of the clusters is displayed as a separate color.
Acknowledgements
The R package networkD3 is used to create the dendrogram, while hierarchical clustering is performed by the hclust function in the stats R package.
Please see What is Hierarchical Clustering?, What is Dendrogram? and What are the Strengths and Weaknesses of Hierarchical Clustering? for more information on hierarchical clustering and dendrograms.
Next
How to Analyze Data by Groups/Segments
How to Do Latent Class Analysis
How to Create a Segmentation Comparison Table
How to Do Mixed Mode Cluster Analysis in Displayr