How to Do Multiple Correspondence Analysis in Displayr

Multiple Correspondence Analysis analyzes categorical variables to detect underlying structure in the data set. This blog post contains an explanation of multiple correspondence analysis and its relationship to correspondence analysis. Whereas traditional correspondence analysis analyzes a table, multiple correspondence analysis analyzes the variables themselves; for example, a multi-response question with 11 categories is analyzed as 11 categorical variables. It is essentially a form of factor analysis for categorical data. You should use it when you want a general understanding of how categorical variables are related. This article describes how to run a Multiple Correspondence Analysis in Displayr.

Requirements

Multiple categorical variables to use as inputs to the Multiple Correspondence Analysis. As an example, we'll use 5 different variables from a political survey: voting in the 2008 and 2012 US elections, approval of President Trump, age, and gender.

Method

From the toolbar, select Visualization > Dimension Reduction > Multiple Correspondence Analysis.
Select the categorical variable inputs from the Input Variables dropdown in the object inspector .
Click the Calculate button to generate the output.

Additional Options

Output - How the analysis results should be displayed. The choices are:
- Scatterplot - A labeled scatterplot showing associations between variables
- Text - A text representation of the analysis
Maximum number of labels to plot - Limits the number of labels shown in the scatterplot. The remaining points are shown without labels. This can be useful with large data sets to avoid overlapping labels.
Chart title - Title of the scatterplot
Color palette - Controls the colors of the points in the Scatterplot output
Missing data - Method for dealing with missing data. See Missing Data Options.
Variable names - Displays Variable Names in the output instead of labels.

Interpretation

The interpretation of multiple correspondence analysis is the same as for Correspondence Analysis. Often the most useful way of analyzing multiple variables is to use traditional correspondence analysis and not multiple correspondence analysis. This is best understood with an example of analyzing a Pick Any question which contains eleven variables, each measuring whether or not respondents liked particular phone companies.

The first thing to note about the output below is that the first dimension is explaining 93.3% of the inertia and the second dimension is explaining 0 (this analysis was conducted using the data set found here, analyzing the question I like them. Multiple correspondence analysis is essentially a form of principal components analysis for categorical data which has focused on identifying latent variables that explain the data. In particular, it has identified a single latent factor whereby respondents differ in terms of the number of brands that they like.

Total sample
Unweighted
base n = 651

Analysis based on 651 observations (weighted n = 651)
Multiple Correspondence Analysis
 
Inertia(s):
              Canonical Correlation  Inertia  Proportion
Dimension 1                   .459     .211        .933
Dimension 2                   .008     .000        .000

Standard coordinates:
                                          Dimension 1  Dimension 2
I like them: AAPT/Cellular One - No             -.52         -.01
I like them: AAPT/Cellular One - Yes            3.09          .06
I like them: New Tel - No                       -.49         -.01
I like them: New Tel - Yes                      3.56          .07
I like them: One-tel - No                       -.50          .00
I like them: One-tel - Yes                      3.30          .01
I like them: Optus - No                         -.53         1.89
I like them: Optus - Yes                         .54        -1.92
I like them: Orange (Hutchison) - No            -.57          .13
I like them: Orange (Hutchison) - Yes           1.62         -.36
I like them: Telstra (Mobile Net) - No          -.53          .08
I like them: Telstra (Mobile Net) - Yes          .85         -.12
I like them: Virgin Mobile - No                 -.56         -.07
I like them: Virgin Mobile - Yes                1.90          .24
I like them: Vodafone - No                      -.53        -2.11
I like them: Vodafone - Yes                      .52         2.04

Principal coordinates:
                                          Dimension 1  Dimension 2
I like them: AAPT/Cellular One - No             -.24          .00
I like them: AAPT/Cellular One - Yes            1.42          .00
I like them: New Tel - No                       -.23          .00
I like them: New Tel - Yes                      1.63          .00
I like them: One-tel - No                       -.23          .00
I like them: One-tel - Yes                      1.51          .00
I like them: Optus - No                         -.24          .01
I like them: Optus - Yes                         .25         -.02
I like them: Orange (Hutchison) - No            -.26          .00
I like them: Orange (Hutchison) - Yes            .74          .00
I like them: Telstra (Mobile Net) - No          -.24          .00
I like them: Telstra (Mobile Net) - Yes          .39          .00
I like them: Virgin Mobile - No                 -.26          .00
I like them: Virgin Mobile - Yes                 .87          .00
I like them: Vodafone - No                      -.24         -.02
I like them: Vodafone - Yes                      .24          .02

             Cronbach's Alpha
Dimension 1               .87
Dimension 2               .06

The chart below shows the principal coordinates; note that the aspect ratio of this chart greatly exaggerates the y-axis – if drawn so that the two axes were on the same scale, the y-axis would essentially disappear.

An alternative approach to analyzing this data is to select Raw Data in the Columns drop-down and run a traditional Correspondence Analysis.

Technical details

The R package ca is used to compute the correspondence analysis. When performing multiple correspondence analysis, Displayr performs an adjustment of the inertias, as described in Nenadic, O. and M. J. Greenacre (2005). "Computation of Multiple Correspondence Analysis, with Code in R." This correction does not guarantee that the resulting percentages add up to 100% as it is only an approximation.

How to Do Traditional Correspondence Analysis

How to Add Images to a Correspondence Analysis Map

How to Do Correspondence Analysis of a Square Table

How to Create a Quality Table From a Correspondence Analysis

How to Do 3D Correspondence Analysis