How to Create a Dimension Reduction Scatterplot

A dimension reduction scatterplot, also known as t-SNE, is a way to visualize the similarity between different observations in the data based on a lot of variables. You can also apply different algorithms to measure the similarity of the variables or provide a distance matrix. This article describes how to produce a 2-dimensional scatterplot to visualize either:

A high-dimensional data (many variables and/or values):

or a distance matrix:

Requirements

Either:
- High-dimensional variables in your data set (many variables)
- A distance matrix, see How to Create a Distance Matrix.

Method 1: Using Variables

If the input type is Variables, the probability that each point has the same class as its nearest neighbor is calculated. A further variable may be specified to classify the output cases into groups using the Group variable field.

1. From the toolbar, select Visualization > Dimension Reduction > Dimension Reduction Scatterplot.
2. In the object inspector , select one of the available dimension reduction techniques from the Algorithm input:

PCA (Principal Component Analysis)
t-SNE
MDS (Multidimensional Scaling) - Metric
MDS - Non-metric

3. Select your input variables from the Variables drop-down list.

4. [Optional]: Tick the Normalize variables checkbox to normalize the data:

For t-SNE and MDS each variable is standardized to the range [0, 1]
For PCA the correlation matrix is used rather than the covariance matrix

5. [Optional]: When Create binary variable from categories is checked, unordered categorical variables with N categories are converted into N-1 binary indicator variables. Otherwise, such variables are each converted to a single numeric variable with integers representing categories (as happens for ordered categories).

6. [Optional]: Enter a value for Perplexity, which is a parameter used by the t-SNE algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50, and the default value is 10.

Low values imply that the immediately local structure is most important.
High values increase the impact of more distant neighbors and global structure

7. Select a Group variable to categorize the output. If numeric, the data are shaded from light (lowest values) to dark (highest). If categorical, data points are colored according to their category.

8. Click the Calculate button to generate the scatterplot.

Method 2: Using a Distance Matrix

1. From the toolbar, select Visualization > Dimension Reduction > Dimension Reduction Scatterplot.

2. In the object inspector , select a distance matrix input, either:

An output created in your document from the Distance matrix dropdown box
Or click Paste or type distance matrix to manually input the distance matrix

3. OPTIONAL: Enter a value for Perplexity, which is a parameter used by the t-SNE algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50, and the default value is 10.

4. Click the Calculate button to generate the scatterplot matrix.

How to Create a Goodness of Fit Plot from a Dimension Reduction Output can be used to assess the accuracy of the fit.

How to Do Principal Component Analysis in Displayr

How to Perform t-SNE in Displayr

How to Do Multidimensional Scaling