Mean and predicted response

Regression analysis
Part of a series on Statistics

Models
Linear regression Simple regression Ordinary least squares Polynomial regression General linear model
Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Mixed model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Non-negative Iteratively reweighted Ridge regression
Least absolute deviations Bayesian Bayesian multivariate
Background
Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Statistics portal

In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Background

Further information: Straight line fitting

In straight line fitting, the model is

y_i=\alpha+\beta x_i +\epsilon_i\,

where $y_i$ is the response variable, $x_i$ is the explanatory variable, ε_i is the random error, and $\alpha$ and $\beta$ are parameters. The predicted response value for a given explanatory value, x_d, is given by

\hat{y}_d=\hat\alpha+\hat\beta x_d ,

while the actual response would be

y_d=\alpha+\beta x_d +\epsilon_d \,

Expressions for the values and variances of $\hat\alpha$ and $\hat\beta$ are given in linear regression.

Mean response

Mean response is an estimate of the mean of the y population associated with x_d, that is $E(y | x_d)=\hat{y}_d\!$ . The variance of the mean response is given by

\text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) = \text{Var}\left(\hat{\alpha}\right) + \left(\text{Var} \hat{\beta}\right)x_d^2 + 2 x_d\text{Cov}\left(\hat{\alpha},\hat{\beta}\right) .

This expression can be simplified to

\text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) =\sigma^2\left(\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right).

To demonstrate this simplification, one can make use of the identity

\sum (x_i - \bar{x})^2 = \sum x_i^2 - \frac{1}{m}\left(\sum x_i\right)^2 .

Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point x_d. So the variance is given by

\text{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta}x_d\right]\right) = \text{Var}\left(y_d\right) + \text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) .

The second part of this expression was already calculated for the mean response. Since $\text{Var}\left(y_d\right)=\sigma^2$ (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

\text{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta}x_d\right]\right) = \sigma^2 + \sigma^2\left(\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right) = \sigma^2\left(1+\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right) .

Confidence intervals

Main article: Confidence interval

Further information: Prediction interval

The $100(1-\alpha)\%$ confidence intervals are computed as $y_d \pm t_{\frac{\alpha }{2},m - n - 1} \sqrt{\text {Var}}$ . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of $y$ values does not shrink when one samples from it, because the random variable ε_i does not decrease, but the variance of the mean of the $y$ does shrink with increased sampling, because the variance in $\hat \alpha$ and $\hat \beta$ decrease, so the mean response (predicted response value) becomes closer to $\alpha + \beta x_d$ .

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression

The general linear model can be written as

y_i=\sum_{j=1}^{n}X_{ij}\beta_j + \epsilon_i\,

Therefore since $y_d=\sum_{j=1}^{n} X_{dj}\hat\beta_j$ the general expression for the variance of the mean response is

\operatorname{Var}\left(\sum_{j=1}^{n} X_{dj}\hat\beta_j\right)= \sum_{i=1}^{n}\sum_{j=1}^{n}X_{di}S_{ij}X_{dj},

where S is the covariance matrix of the parameters, given by

\mathbf{S}=\sigma^2\left(\mathbf{X^{\mathsf{T}}X}\right)^{-1}

References

Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.

Least squares and regression analysis

Computational statistics

Correlation and dependence

Regression analysis

Regression as a
statistical model

Linear regression	Simple linear regression Ordinary least squares Generalized least squares Weighted least squares General linear model

Predictor structure	Polynomial regression Growth curve (statistics) Segmented regression Local regression

Non-standard	Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic

Non-normal errors	Generalized linear model Binomial Poisson Logistic

Decomposition of variance

Model exploration

Background

Design of experiments

Numerical approximation

Applications

Regression analysis category
Statistics category
Statistics portal
Statistics outline
Statistics topics

This article is issued from Wikipedia - version of the Thursday, March 24, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.