Principal Components Analysis (PCA) is a technique for taking many variables and creating a new, smaller set of variables which aims to capture as much of the variation in the data as possible. This article describes how run a Principal Component Analysis in Displayr.
- Familiarity with the Structure and Value Attributes of Variable Sets.
- A data set containing several Numeric, Numeric - Multi, or Binary - Multi variables that you want to combine and reduce down to a smaller number of variables or components. For this example, we'll use a series of attitudinal questions about mobile device attributes on a 1-5 scale where 1="Strongly agree" and 5="Strongly disagree".
1. From the toolbar menu, select Anything > Advanced Analysis > Dimension Reduction > Principal Component Analysis.
2. From the object inspector on the right, select your input variables and/or variable sets from the Variables drop-down.
3. Click the Calculate button to run the PCA.
Normalize variables - if selected the the correlation matrix will be used instead of the covariance matrix. This is checked by default.
Create binary variables from categories - if selected, unordered categorical variables will be represented as binary variables. Otherwise, their Value Attributes are used.are treated according to their numeric values and not converted to binary. This is unchecked by default.
Rule for selecting components - the following options are available:
- Kaiser rule keeps components - keeps components with eigenvalues greater than 1. If the unscaled covariance matrix is used instead of the correlation matrix, components with eigenvalues greater than the mean eigenvalue are kept. This is selected by default.
- Eigenvalue over - keeps components with eigenvalues greater than a user-specified number. If the unscaled covariance matrix is used instead of the correlation matrix, components with eigenvalues greater than a multiple of the eigenvalue mean are kept.
- Number of components - manually select the number of components to keep.
Rotation method - rotations of the principal components are used to produce solutions where the loadings tend to be closer to 0, 1, or -1, making interpretation of the solution easier.
The Varimax, Quartimax, and Equamax rotations are orthogonal, which means that the components produced are always uncorrelated with one another.
The Promax and Oblimin rotations are oblique, meaning that the components can be correlated with one another.
After rotation, components with large negative loadings will have signs flipped, so that the largest loadings are positive, to make interpretation easier.
Missing data - determines how to handle missing data. See Missing Data Options for more detail.
Output - the following PCA output options are available:
- Loadings Table - displays a table of the component loadings, which is sometimes referred to as a Pattern matrix. This is selected by default.
- Structure Matrix - displays the structure matrix, which is the loadings matrix multiplied by the correlations between the components.
- Variance Explained - displays the eigenvalues of the original, unrotated components, along with the variance explained, and cumulative variance explained.
Eigenvalues are a number that comes out of the maths of the process of determining the new principal components. It represents the amount of variance in the original data that is captured by that component.The percentage figures in the top row represent the percentage of variance represented by that component, and these percentages are worked out by dividing each eigenvalue by the total of all the eigenvalues of all of the components (before the smallest ones are chucked out).PCA is a process of finding a new, smaller set of variables which captures as much variance as possible. So if you want 5 new components, you are picking the 5 new variables which have the largest eigenvalues.
- Component Plot - displays a scatterplot of the loadings of the first two principal components.
- Scree Plot - displays a chart of the eigenvalues of the correlation or covariance matrix.
- Detailed Output - shows more details on the results, including the loadings, structure matrix, variable communalities, sum of squared loadings, and score weights.
- 2D Scatterplot - shows the data charted with axes of the first 2 components and labelled according to Grouping Variable.
Sort coefficients by size - when displaying loadings or the structure matrix, sort the components according to their size.
Suppress small coefficients - when displaying loadings or the structure matrix, replace small values with blank spaces to facilitate interpretation.
Absolute value below - in tables, cells which have absolute values smaller than the entered value will be replaced with blank spaces.
Variable names - if checked, displays variable names in the output instead of variable labels.