A dimension reduction scatterplot is a way to visualize the similarity between different observations in the data based on a lot of variables. You can also apply different algorithms to measure the similarity of the variables or provide a distance matrix. This article describes how to produce a 2-dimensional scatterplot to visualize either:

high dimensional data (many variables and/or values)

or a distance matrix.

## Requirements

- Either:
- High dimension variables in your data set (many variables)
- A distance matrix, see How to Create a Distance Matrix or you can paste one in as a table.

## Method 1: Using Variables

If the input type is Variables, the probability that each point has the same class as its nearest neighbor is calculated. A further variable may be specified to classify the output cases into groups using the Group variable field.

1. From the toolbar menu, select **Anything > Advanced Analysis > Dimension Reduction > Dimension Reduction Scatterplot. **

2. Select one of the available dimension reduction techniques from the **Algorithm** input:

- PCA (Principal Component Analysis)
- t-SNE
- MDS (Multidimensional Scaling) - Metric
- MDS - Non-metric

3. Select your input variables from the **Variables** drop-down list.

4. [Optional]: Tick the **Normalize variables** checkbox to normalize the data:

- For
*t-SNE*and*MDS*each variable is standardized to the range [0, 1] - For
*PCA*the correlation matrix is used rather than the covariance matrix

5. [Optional]: When **Create binary variable from categories** is checked, unordered categorical variables with N categories are converted into N-1 binary indicator variables. Otherwise such variables are each converted to a single numeric variable with integers representing categories (as happens for ordered categories).

6. [Optional]: Enter a value for **Perplexity** which is a parameter used by the *t-SNE* algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50 and the default value is 10.

- Low values imply that immediately local structure is most important.
- High values increase the impact of more distant neighbors and global structure

7. Select a **Group variable** to categorize the output. If numeric, the data are shaded from light (lowest values) to dark (highest). If categorical, data points are colored according to their category.

8. Click the **Calculate** button to generate the scatterplot.

## Method 2: Using a Distance Matrix

1. From the toolbar menu, select **Anything > Advanced Analysis > Dimension Reduction > Dimension Reduction Scatterplot. **

2. Select a distance matrix input either:

- An output created in your document from the
**Distance matrix**drop-down box

- Or click
**Paste or type distance matrix**to manually input the distance matrix

3. OPTIONAL: Enter a value for **Perplexity** which is a parameter used by the *t-SNE* algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50 and the default value is 10.

4. Click the **Calculate** button to generate the scatterplot matrix.

## Next

Dimension Reduction - Plot - Goodness of Fit can be used to assess the accuracy of the fit.

How to Do Principal Component Analysis in Displayr

How to Do Multidimensional Scaling