Hierarchical cluster analysis is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each of the other clusters, and the objects within each cluster are broadly similar to each other. This article describes how to conduct a hierarchical cluster analysis in Displayr.
- Hierarchical clustering can be performed with either raw data or a distance matrix. When raw data is used, the a distance matrix is automatically computed in the background.
1. From the toolbar menu, select Anything > Advanced Analysis > Cluster > Hierarchical cluster analysis.
2. From the object inspector on the right, select the variables from your data set that you want to use as inputs to the cluster analysis. For this example, we've used binary variables showing device ownership from a technology survey.
3. Enter a value for the Number of clusters that you want to create.
4. OPTIONAL: Select a distance measure from the Distance input. Euclidean is selected by default. For more information, see the dist package documentation which is used for the distance matrix computation.
5. OPTIONAL: Select the algorithm to use to form the clusters from the Clustering method input. The Ward2 algorithm is selected by default. For more details, see the hclust package documentation.
6. Click the Calculate button to generate the custom analysis output.
The output is what's called a dendrogram which shows the distance between the variables. Each of the clusters is displayed as a separate color.