Multiple correlation
In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables.[1]
The coefficient of multiple correlation takes values between 0 and 1; a higher value indicates a better predictability of the dependent variable from the independent variables, with a value of 1 indicating that the predictions are exactly correct and a value of 0 indicating that no linear combination of the independent variables is a better predictor than is the fixed mean of the dependent variable.[2]
The coefficient of multiple correlation is computed as the square root of the coefficient of determination, but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure.
Definition
The coefficient of multiple correlation, denoted R, is a scalar that is defined as the Pearson correlation coefficient between the predicted and the actual values of the dependent variable in a linear regression model that includes an intercept.
Computation
The square of the coefficient of multiple correlation can be computed using the vector  of correlations
 of correlations  between the predictor variables
 between the predictor variables  (independent variables) and the target variable
 (independent variables) and the target variable  (dependent variable), and the correlation matrix
 (dependent variable), and the correlation matrix  of inter-correlations between predictor variables. It is given by
 of inter-correlations between predictor variables. It is given by
where  is the transpose of
 is the transpose of  , and
, and  is the inverse of the matrix
 is the inverse of the matrix
If all the predictor variables are uncorrelated, the matrix  is the identity matrix and
 is the identity matrix and  simply equals
 simply equals  , the sum of the squared correlations with the dependent variable.  If the predictor variables are correlated among themselves, the inverse of the correlation matrix
, the sum of the squared correlations with the dependent variable.  If the predictor variables are correlated among themselves, the inverse of the correlation matrix  accounts for this.
 accounts for this.
The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the sum of squared residuals—that is, the sum of the squares of the prediction errors—divided by the sum of the squared deviations of the values of the dependent variable from its expected value.
Properties
With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of  on
 on  and
 and  will in general have a different
 will in general have a different  than will a regression of
 than will a regression of  on
 on  and
 and  .  For example, suppose that in a particular sample the variable
.  For example, suppose that in a particular sample the variable  is uncorrelated with both
 is uncorrelated with both  and
 and  , while
, while  and
 and  are linearly related to each other. Then a regression of
 are linearly related to each other. Then a regression of  on
 on  and
 and  will yield an
 will yield an  of zero, while a regression of
 of zero, while a regression of  on
 on  and
 and  will yield a strictly positive
 will yield a strictly positive  . This follows since the correlation of
. This follows since the correlation of  with the best predictor based on
 with the best predictor based on  and
 and  is in all cases at least as large as the correlation of
 is in all cases at least as large as the correlation of  with the best predictor based on
 with the best predictor based on  alone, and in this case with
 alone, and in this case with  providing no explanatory power it will be exactly as large.
 providing no explanatory power it will be exactly as large.
References
- Allison, Paul D. (1998). Multiple Regression: A Primer. London: Sage Publications. ISBN 9780761985334
- Cohen, Jacob, et al. (2002). Applied Multiple Regression: Correlation Analysis for the Behavioral Sciences. ISBN 0805822232
- Crown, William H. (1998). Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models. ISBN 0275953165
- Edwards, Allen Louis (1985). Multiple Regression and the Analysis of Variance and Covariance. ISBN 0716710811
- Keith, Timothy (2006). Multiple Regression and Beyond. Boston: Pearson Education.
- Fred N. Kerlinger, Elazar J. Pedhazur (1973). Multiple Regression in Behavioral Research. New York: Holt Rinehart Winston. ISBN 9780030862113
- Stanton, Jeffrey M. (2001). "Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors", Journal of Statistics Education, 9 (3).

