This article describes how to perform linear regression.
- Predictor variables (aka features or independent variables) - these can be numeric or binary. To use categorical variables in regression, you need to create a separate dummy variable for each category and use those instead (e.g. if Employment Category has three categories (manager, custodial, clerical) you can create three new variables called manager, custodial and clerical)
- An outcome variable (aka dependent variable) - this variable must be numeric.
- From the toolbar, go to Anything > Advanced Analysis > Regression > Linear Regression
- In the object inspector, select your numeric Outcome variable.
- In Predictor(s), select your predictor variable(s). The fastest way to do this is to select them all in the Data Sets tree and drag them into the Predictor(s) box, but if you are using dummy variables you created from a categorical variable, be sure to leave one out to serve as the reference category.
- From Algorithm, choose Regression
- From Regression type, choose Linear.
- From Output, the default is Summary. This output gives you the Regression coefficients table, R-Squared, the AIC fit value, missing value treatment, and other information. There are several other options you can choose from but you should use Summary if you are primarily interested in the regression equation and the percentage of variance the model accounts for.
- Missing data gives you several options for how to treat missing values. The default is to exclude cases with missing values. This is usually the preferred choice unless the regression model contains variables with high percentages of missing values. If cases like that, you might consider excluding one or more of those variables, or Multiple imputation you want to keep those variables in your regression model with estimated values.