Multiple Correspondence Analysis analyzes categorical variables to detect underlying structure in the data set. This blog post contains an explanation of multiple correspondence analysis and its relationship to correspondence analysis. Whereas traditional correspondence analysis analyzes a table, multiple correspondence analysis analyzes the variables themselves; for example, a multi-response question with 11 categories is analyzed as 11 categorical variables. It is essentially a form of factor analysis for categorical data. You should use it when you want a general understanding of how categorical variables are related. This article describes how to run a Multiple Correspondence Analysis in Displayr.
Requirements
- Multiple categorical variables to use as inputs to the Multiple Correspondence Analysis. As an example, we'll use 5 different variables from a political survey: voting in the 2008 and 2012 US elections, approval of President Trump, age, and gender.
Please note these steps require a Displayr license.
Method
- From the toolbar, select Visualization > Dimension Reduction > Multiple Correspondence Analysis.
- Select the categorical variable inputs from the Input Variables dropdown in the object inspector.
- Click the Calculate button to generate the output.
Additional Options
-
Output - How the analysis results should be displayed. The choices are:
- Scatterplot - A labeled scatterplot showing associations between variables
- Text - A text representation of the analysis
- Maximum number of labels to plot - Limits the number of labels shown in the scatterplot. The remaining points are shown without labels. This can be useful with large data sets to avoid overlapping labels.
- Chart title - Title of the scatterplot
- Color palette - Controls the colors of the points in the Scatterplot output
- Missing data - Method for dealing with missing data. See Missing Data Options.
- Variable names - Displays Variable Names in the output instead of labels.
Interpretation
The interpretation of multiple correspondence analysis is the same as for Correspondence Analysis. Often the most useful way of analyzing multiple variables is to use traditional correspondence analysis and not multiple correspondence analysis. This is best understood with an example of analyzing a Pick Any question which contains eleven variables, each measuring whether or not respondents liked particular phone companies.
The first thing to note about the output below is that the first dimension is explaining 93.3% of the inertia and the second dimension is explaining 0 (this analysis was conducted using the data set found here, analyzing the question I like them. Multiple correspondence analysis is essentially a form of principal components analysis for categorical data which has focused on identifying latent variables that explain the data. In particular, it has identified a single latent factor whereby respondents differ in terms of the number of brands that they like.
Total sample Unweighted base n = 651 Analysis based on 651 observations (weighted n = 651) Multiple Correspondence Analysis Inertia(s): Canonical Correlation Inertia Proportion Dimension 1 .459 .211 .933 Dimension 2 .008 .000 .000 Standard coordinates: Dimension 1 Dimension 2 I like them: AAPT/Cellular One - No -.52 -.01 I like them: AAPT/Cellular One - Yes 3.09 .06 I like them: New Tel - No -.49 -.01 I like them: New Tel - Yes 3.56 .07 I like them: One-tel - No -.50 .00 I like them: One-tel - Yes 3.30 .01 I like them: Optus - No -.53 1.89 I like them: Optus - Yes .54 -1.92 I like them: Orange (Hutchison) - No -.57 .13 I like them: Orange (Hutchison) - Yes 1.62 -.36 I like them: Telstra (Mobile Net) - No -.53 .08 I like them: Telstra (Mobile Net) - Yes .85 -.12 I like them: Virgin Mobile - No -.56 -.07 I like them: Virgin Mobile - Yes 1.90 .24 I like them: Vodafone - No -.53 -2.11 I like them: Vodafone - Yes .52 2.04 Principal coordinates: Dimension 1 Dimension 2 I like them: AAPT/Cellular One - No -.24 .00 I like them: AAPT/Cellular One - Yes 1.42 .00 I like them: New Tel - No -.23 .00 I like them: New Tel - Yes 1.63 .00 I like them: One-tel - No -.23 .00 I like them: One-tel - Yes 1.51 .00 I like them: Optus - No -.24 .01 I like them: Optus - Yes .25 -.02 I like them: Orange (Hutchison) - No -.26 .00 I like them: Orange (Hutchison) - Yes .74 .00 I like them: Telstra (Mobile Net) - No -.24 .00 I like them: Telstra (Mobile Net) - Yes .39 .00 I like them: Virgin Mobile - No -.26 .00 I like them: Virgin Mobile - Yes .87 .00 I like them: Vodafone - No -.24 -.02 I like them: Vodafone - Yes .24 .02 Cronbach's Alpha Dimension 1 .87 Dimension 2 .06
The chart below shows the principal coordinates; note that the aspect ratio of this chart greatly exaggerates the y-axis – if drawn so that the two axes were on the same scale, the y-axis would essentially disappear.
An alternative approach to analyzing this data is to select RAW DATA in the Columns drop-down and run a traditional Correspondence Analysis.
Technical details
The R package ca is used to compute the correspondence analysis. When performing multiple correspondence analysis, Displayr performs an adjustment of the inertias, as described in Nenadic, O. and M. J. Greenacre (2005). "Computation of Multiple Correspondence Analysis, with Code in R." This correction does not guarantee that the resulting percentages add up to 100% as it is only an approximation.
Next
How to Do Traditional Correspondence Analysis
How to Add Images to a Correspondence Analysis Map
How to Do Correspondence Analysis of a Square Table
How to Create a Quality Table From a Correspondence Analysis