The Multinomial Logit is a form of regression analysis that models a discrete and nominal dependent variable with more than two outcomes (Yes/No/Maybe, Red/Green/Blue, Brand A/Brand B/Brand C, etc.). It is also known as a multinomial logistic regression and multinomial logistic discriminant analysis.
This article describes how to create a Multinomial Logit regression output as shown below. The example below is a model that predicts a survey respondent’s brand choice based on characteristics like age, gender, and work status.
- Familiarity with the Structure and Value Attributes of Variable Sets, and how they are used in regression models per our Driver Analysis ebook.
- An Outcome variable with more than two outcomes to be predicted. Ideally, a Nominal: Mutually exclusive categories variable. When using stacked data the Outcome variable should be a single question in a Multi type structure (eg. ).
- Predictor variables will be considered as predictors of the outcome variable. When using stacked data the Predictor(s) need to be a single question in a Grid type structure (eg. Binary - Grid or a Numeric - Grid).
- Go to Anything > Advanced Analysis > Regression > Multinomial Logit.
- In the object inspector go to the Inputs tab.
- In the Outcome dropdown, select the variable to be predicted by the predictor variables.
- Select the predictor variable(s) from the Predictor(s) list.
- OPTIONAL: Select the desired Output type:
- Summary: The default; as shown in the example above.
- Detail: Typical R output, some additional information compared to Summary, but without the pretty formatting.
- ANOVA: Analysis of variance table containing the results of Chi-squared likelihood ratio tests for each predictor.
- OPTIONAL: Select the desired Missing Data treatment. (See Missing Data Options).
- OPTIONAL: Select Variable names to display variable names in the output instead of labels.
- OPTIONAL: Select Correction. Choose between None (the default), False Discovery Rate, Bonferroni. This is the multiple comparisons correction applied when computing the p-values of the post-hoc comparisons.
- OPTIONAL: Select Stack data to stack the input data prior to analysis. Stacking can be desirable when each individual in the data set has multiple cases and an aggregate model is desired. See requirements below.
- OPTIONAL: Select Random seed to initialize the (pseudo)random number generator for the model fitting algorithm. Different seeds may lead to slightly different answers, but should normally not make a large difference.
When using this feature you can obtain additional information that is stored by the R code which produces the output.
- To do so, select Calculation > Custom.
- In the R CODE, paste: item = YourReferenceName
- Replace YourReferenceName with the reference name of your output. Find this by selecting the output and then going to Properties > GENERAL > Name from the object inspector.
- Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.
For a more in depth discussion on extracting information from objects in R, checkout our blog post here.
Properties which may be of interest are:
- Summary outputs from the regression model:
- item$summary$coefficients # summary regression outputs
If Stack data is selected, then the Outcome needs to be a single . Similarly, the Predictor(s) need to be a single . In the process of stacking, the data reduction is inspected. Any constructed NETs are removed unless comprised of source values that are mutually exclusive to other codes, such as the result of merging two categories.