A dimension reduction scatterplot is a way to visualize the similarity between different observations in the data based on a lot of variables. You can also apply different algorithms to measure the similarity of the variables or provide a distance matrix. This article describes how to produce a 2-dimensional scatterplot to visualize either:
high dimensional data (many variables and/or values)
or a distance matrix.
Requirements
- Either:
- High dimension variables in your data set (many variables)
- A distance matrix, see How to Create a Distance Matrix or you can paste one in as a table.
Please note these steps require a Displayr license.
Method 1: Using Variables
If the input type is Variables, the probability that each point has the same class as its nearest neighbor is calculated. A further variable may be specified to classify the output cases into groups using the Group variable field.
1. From the toolbar menu, select Anything > Advanced Analysis > Dimension Reduction > t-SNE.
2. Select one of the available dimension reduction techniques from the Algorithm input:
- PCA (Principal Component Analysis)
- t-SNE
- MDS (Multidimensional Scaling) - Metric
- MDS - Non-metric
3. Select your input variables from the Variables drop-down list.
4. [Optional]: Tick the Normalize variables checkbox to normalize the data:
- For t-SNE and MDS each variable is standardized to the range [0, 1]
- For PCA the correlation matrix is used rather than the covariance matrix
5. [Optional]: When Create binary variable from categories is checked, unordered categorical variables with N categories are converted into N-1 binary indicator variables. Otherwise such variables are each converted to a single numeric variable with integers representing categories (as happens for ordered categories).
6. [Optional]: Enter a value for Perplexity which is a parameter used by the t-SNE algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50 and the default value is 10.
- Low values imply that immediately local structure is most important.
- High values increase the impact of more distant neighbors and global structure
7. Select a Group variable to categorize the output. If numeric, the data are shaded from light (lowest values) to dark (highest). If categorical, data points are colored according to their category.
8. Click the Calculate button to generate the scatterplot.
Method 2: Using a Distance Matrix
1. From the toolbar menu, select Anything > Advanced Analysis > Dimension Reduction > t-SNE.
2. Select a distance matrix input either:
- An output created in your document from the Distance matrix drop-down box
- Or click Paste or type distance matrix to manually input the distance matrix
3. OPTIONAL: Enter a value for Perplexity which is a parameter used by the t-SNE algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50 and the default value is 10.
4. Click the Calculate button to generate the scatterplot matrix.
Next
Dimension Reduction - Plot - Goodness of Fit can be used to assess the accuracy of the fit.
How to Do Principal Component Analysis in Displayr
How to Do Multidimensional Scaling