How to Run Poisson Regression

Poisson Regression is used to model count data with the assumption that the dependent variable has a Poisson distribution. It is also known as the log-linear model.

The example below models a survey respondent’s number of fast-food occasions based on characteristics like age, gender, and work status.

Requirements

Familiarity with the Structure and Value Attributes of Variable Sets.
An Outcome variable with at least three outcomes to be predicted. Ideally, a numeric variable. A count variable must only include positive integers. When using stacked data, the Outcome variable should be a single question in a Multi-type structure (eg, numeric-multi).
Continuous, categorical, or binary Predictor variables will be considered as predictors of the outcome variable. When using stacked data, the Predictor(s) must be a single question in a grid-type structure (Binary - Grid).

Method

From the Report tree select +> Advanced Analysis > Regression > Poisson Regression.
In the Object Inspector Go to the Data tab.
In the Outcome dropdown, select the numeric variable to be predicted by the predictor variables.
Select the predictor variable(s) from the Predictor(s) dropdown.
OPTIONAL: Select the fitting Algorithm. The default is Regression, but it may be changed to other machine learning methods.
OPTIONAL: Select the desired Output type:
- Summary: The default, as shown in the example above.
- Detail: Typical R output, some additional information compared to Summary, but without the pretty formatting.
- ANOVA: Analysis of variance table containing the results of Chi-squared likelihood ratio tests for each predictor.
- Relative Importance Analysis: The results of a relative importance analysis.
- Effects Plot Plots the relationship between each of the Predictors and the Outcome.
OPTIONAL: Select the desired Missing Data treatment. (See Missing Data Options).
OPTIONAL: Select Variable names to display variable names in the output instead of labels.
OPTIONAL: Select Correction. This is the multiple comparisons correction applied when computing the p-values of the post-hoc comparisons. Choose between None (the default), False Discovery Rate, and Bonferroni.
OPTIONAL: Select Crosstab Interaction. An optional variable to test for interaction with other variables in the model. The interaction variable is treated as a categorical variable.
OPTIONAL: Specify the Automated outlier removal percentage (between 0 and 50, including 0 but not 50) to remove possible outliers. See below for details about fitted models and residuals.
OPTIONAL: Select Stack data to stack the input data prior to analysis. Stacking can be desirable when each individual in the data set has multiple cases and an aggregate model is desired. See requirements below.
OPTIONAL: Update Random seed. This is used to initialize the (pseudo-)random number generator for the model-fitting algorithm. Different seeds may yield slightly different answers, but the differences should not be large.

Additional Properties

When using this feature, you can obtain additional information that is stored by the R code that produces the output.

To do so, from the toolbar, select Calculation > Custom Code.

2. Click on the page to place the output.

3. In the R Code, paste:

item = YourReferenceName

4. Replace YourReferenceName with the reference name of your output. Find this by selecting the output and then going to General > General > Name from the object inspector .

5. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in-depth discussion on extracting information from objects in R, check out How to Extract Information from an Item using R.

Properties which may be of interest are:

Summary outputs from the regression model:

item$summary$coefficients # summary regression outputs

Technical Details

If Crosstab Interaction is selected, coefficients in the table are computed by creating separate regressions for each level of the interaction variable. To evaluate whether a coefficient is significantly higher (blue) or lower (red), we perform a t-test of the coefficient compared to the coefficient using the remaining data as described in Driver Analysis. P-values are corrected for multiple comparisons across the whole table (excluding the NET column). The P-value in the sub-title is calculated using a likelihood ratio test between the pooled model with no interaction variable, and a model where all predictors interact with the interaction variable.

If a zero-value Automated outlier removal percentage is indicated, then no outlier removal is performed and a standard regression output for the entire (possibly filtered) dataset is applied. If a non-zero value is selected for this option then the regression model is fitted twice. The first regression model uses the entire dataset (after filters have been applied) and identifies the observations that generate the largest residuals. The user-specified percent of cases in the data that have the largest residuals are then removed. The regression model is refitted on this reduced dataset and returned output. The specific residual used varies depending on the regression type. For Poisson regression, a studentized deviance residual in an unweighted regression and the Pearson residual in a weighted regression. The studentized deviance residual computes the contribution the fitted point has to the likelihood and standardizes (adjusts) based on the influence of the point and an externally adjusted variance calculation (see rstudent function in R and Davison and Snell (1991)^[2] for more details). The Pearson residual in the weighted case computes the distance between the observed and fitted value and adjusts appropriately for the provided survey weights. See rstudent function in R and Davison and Snell (1991) for more details of the specifics of the calculations.

If Stack data is selected, then the Outcome needs to be a single variable set that has a Multi type structure suitable for regression such as a Binary - Multi, Nominal - Multi, Ordinal - Multi, or Numeric - Multi. Similarly, the Predictor(s) need to be a single variable set that has a grid type structure such as a Binary - Grid or a Numeric - Grid. In the process of stacking, the data reduction is inspected. Any constructed NETs are removed unless comprised of source values that are mutually exclusive to other codes, such as the result of merging two categories.

References

Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In: Statistical Theory and Modelling. In Honour of Sir David Cox, FRS, eds. Hinkley, D. V., Reid, N. and Snell, E. J., Chapman & Hall.

How to Do Driver Analysis

How to Create Regression Multicollinearity Table (VIF)

How to Create a Prediction-Accuracy Table

How to Create a Goodness-of-Fit Plot

How to Save Predicted Values of Regression Models

How to Save Fitted Values of Regression Models

How to Save Probabilities of Each Response of Regression Models

How to Test Residual Normality (Shapiro-Wilk) of Regression Models

How to Test Residual Serial Correlation (Durbin-Watson) of Regression Models

How to Save Residuals of Regression Models