Multiple regression, a time-honored technique going back to Pearson's use of it in 1908, is employed to account for (predict) the variance in an interval dependent variable, based on linear combinations of interval, dichotomous, or dummy independent variables. Often called OLS regression because of its reliance on ordinary least squares estimation, multiple regression can establish whether a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of R2), and can establish the relative predictive importance of the independent variables (by comparing beta weights). Power terms can be added as independent variables to explore curvilinear effects. Cross-product terms can be added as independent variables to explore interaction effects. One can test the significance of the difference of two R2's to determine if adding an independent variable to the model would help significantly. Using hierarchical regression, the researcher can see how much variance in the dependent variable can be explained by one or a set of new independent variables, over and above that explained by an earlier set. The parameter estimates (b coefficients and the constant) can be used to construct a prediction equation and generate predicted scores for further analysis.
The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's are the regression coefficients, representing the amount the dependent variable (y) changes when the corresponding independent variable changes 1 unit. The c is the constant, indicating where the regression line intercepts the y axis, representing the magnitude the dependent will be when all the independent variables are held to 0. The standardized version of the b coefficients are the beta weights, and the ratio of the beta coefficients is often interpreted as the ratio of the relative predictive power of the independent variables. Associated with multiple regression is R2, multiple correlation, which is the percent of variance in the dependent variable explained collectively by all of the independent variables in the model.
Multiple regression shares all the assumptions of correlation: linearity of relationships, the same level of relationship throughout the range of the independent variable (homoscedasticity), interval or near-interval measurement level, absence of outliers, and data whose range is not truncated. In addition, it is important that the model being tested is correctly specified. The exclusion of important causal variables or the inclusion of extraneous variables can change markedly the beta weights and hence the interpretation of the importance of the independent variables.
There are many of alternatives to ordinary least squares (OLS) regression, including general linear models, generalized linear models, linear mixed models, logistic regression, Cox regression, and many more. These are treated in separate volumes of the Statistical Associates "blue book" series.
Table of Contents
Key Terms and Concepts 13
OLS 13
Variables 13
Regression equation 13
Dependent variable 14
Independent variables 14
Dummy variables 14
Interaction effects 15
Interactions 15
Significance of interaction effects 16
Interaction terms involving categorical dummies 17
Separate regressions 17
Predicted values 18
Predicted values 18
Adjusted predicted values 18
Residuals 19
Centering 19
Regression coefficients in SPSS 20
Example 20
The regression coefficient, b 20
Coefficients table 21
Zero-order, partial, and part correlations 21
Squared partial correlation 22
Semi-partial correlation 22
Interpreting b for dummy variables 22
Regression coefficients in SAS 23
Example 23
SAS syntax 23
SAS tables 24
SAS plots 26
Residual by regressor plots 26
Fit diagnostics panel 27
Residuals vs. predicted values 27
Studentized residuals vs. predicted values 27
Studentized residuals vs. leverage 27
110 more pages of regression topics

