This article describes how to do a k-means cluster analysis in Displayr. The k-means cluster analysis algorithm is a method for grouping similar cases into groups, or clusters. The final clusters will be different from each other, while the cases within a cluster are broadly similar to each other.

## Requirements

- A data set containing the variables that you want to use as inputs to the cluster analysis segmentation
- Familiarity with the
*Structure*and*Value Attributes*of Variable Sets

## Method

1. From the **toolbar**, select **Anything > Advanced Analysis ****> Cluster ****> K-Means Cluster Analysis**. A cluster analysis object will added to the current page.

2. From the **object inspector**, select the inputs (clustering variables) from the **Variables** dropdown in the **Inputs** section. For this example, we've selected 11 behavioral/attitudinal statements on mobile technology. Questions were asked as a 5-point agree/disagree scale. We'll use the top 2 box responses to each of the statements as the inputs to our k-means cluster analysis.

You can use any other numeric variables as clustering variables that can potentially provide differentiation between the respondents and therefore help define the clusters.

*Note that if the variables are grouped in a Variable Set, then the Variable Set may be selected instead, which is more convenient than selecting multiple variables.*

3. Select the number of clusters that you want to create in the **Number of clusters** field. I've selected 3 clusters for this example, but you can choose any value you want here.

4. Optional: Modify any of the other input settings as desired. For this example, we'll leave the default values selected. Options include:

**Missing data**(see Missing Data Options):**Error if missing data****Exclude cases with missing data****Use partial data**- This is the default**Imputation (replace missing values with estimates)**

**Algorithm**:**Batch**- This is the default and is the only algorithm that can accommodate weights or missing values. Refer to the Technical Details section below.**Hartigan-Wong**- Refer to`kmeans`for more information on this and the algorithms below**Forgy****Lloyd****MacQueen**

**Output**:**Means****Means table**- Show the cluster means. Best if wanting to export to another program**Segment profiling table**- Show the composition of the*Profiling variables*within predicted clusters. More options to control the appearance are described here.

**Cluster labels**- An optional comma-separated list used to name the clusters predicted by the k-means model**Profiling variables**- Select other variables or variable sets to crosstab with the segment cluster

5. Click the **Calculate** button (or tick the **Automatic** checkbox so that the analysis will re-run automatically if any changes are made).

## Interpreting the Results

The standard table of means output shown above lists each of the clustering variables in the rows and shows the mean Top 2 Box percentage for each of the clusters.

- The size of each cluster (n) is shown in the column header.
- The red and blue highlights indicate whether or not the Top 2 Box score is higher (blue) or lower (red) than the overall mean. The red and blue colors are also scaled to provide some additional differentiation (darker shades of red/blue are farther from the mean).
- Means in bold font are significantly higher/lower than the mean score.
- The R-Squared value shows proportion of variance in the cluster assignment that is explained by the each of the clustering variables. In the example above, we can see that there are 4 statements that have a greater impact on the segment/cluster predictions than do the remaining variables.
- The p-value shows which statement variables are significant in the model.

## Saving Cluster Membership

Individual respondents can be assigned to the individual clusters in Displayr by first selecting the k-Means Cluster Analysis output and then selecting **Inputs > Save Variable(s) > Cluster Membership**. A new categorical variable is added to the top of the data set called "Segment/Cluster memberships from r.output". Locate the new variable in the **Data Sets **tree and hover over it to preview the respondent level membership data or drag the variable onto the page to create a table.

This segment/cluster variable can be used for profiling against your demographic variables. Once you've identified the key differences between your clusters, try to come up with names that describe each cluster. You can add then these names to the cluster variable by first selecting the variable in the**Data Sets**tree, click the**Labels**button from the**Properties**on the right and enter your the cluster names in the**Label**column. Click**OK**to save the cluster names.

## Technical Details

The **Batch **algorithm works as follows:

- The
**Hartigan-Wong**k-means algorithm is used to find clusters with missing data set to**Exclude cases with missing data**. - Cases are assigned to the most similar cluster. Where
**Missing data**is set to**use partial data**(the default), this means that cases that were ignored by`Hartigan-Wong`are now included in the analysis. - The cluster centers are updated. Where weights have been applied, this means that the cluster centers now reflect weights (they were ignored by
**Hartigan-Wong**). - The previous two steps are repeated until the either the maximum number of iterations,
`iter.max`has been exceeded (which defaults to 100), or, the*Omega-Squared*does not increase.

## Next

How to Analyze Data by Groups/Segments

How to Do Latent Class Analysis

How to Create a Segmentation Comparison Table

How to Do Mixed Mode Cluster Analysis in Displayr

How to Save K-Means Cluster Membership

How to Do Hierarchical Cluster Analysis in Displayr

## Comments

0 comments

Article is closed for comments.