Semiparametric regression

Regression analysis
Part of a series on Statistics

Models
Linear regression Simple regression Ordinary least squares Polynomial regression General linear model
Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Mixed model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Non-negative Iteratively reweighted Ridge regression
Least absolute deviations Bayesian Bayesian multivariate
Background
Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Statistics portal

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

Methods

Many different semiparametric regression methods have been proposed and developed. The most popular methods are the partially linear, index and varying coefficient models.

Partially linear models

A partially linear model is given by

Y_i = X'_i \beta + g\left(Z_i \right) + u_i, \, \quad i = 1,\ldots,n, \,

where $Y_{i}$ is the dependent variable, $X_{i}$ and $Z_{i}$ are $p \times 1$ vectors of explanatory variables, $\beta$ is a $p \times 1$ vector of unknown parameters and $Z_{i} \in \operatorname{R}^{q}$ . The parametric part of the partially linear model is given by the parameter vector $\beta$ while the nonparametric part is the unknown function $g\left(Z_{i}\right)$ . The data is assumed to be i.i.d. with $E\left(u_{i}|X_{i},Z_{i}\right) = 0$ and the model allows for a conditionally heteroskedastic error process $E\left(u^{2}_{i}|x,z\right) = \sigma^{2}\left(x,z\right)$ of unknown form. This type of model was proposed by Robinson (1988) and extended to handle categorical covariates by Racine and Liu (2007).

This method is implemented by obtaining a $\sqrt{n}$ consistent estimator of $\beta$ and then deriving an estimator of $g\left(Z_{i}\right)$ from the nonparametric regression of $Y_{i} - X'_{i}\hat{\beta}$ on $z$ using an appropriate nonparametric regression method.^[1]

Index models

A single index model takes the form

Y = g\left(X'\beta_{0}\right) + u, \,

where $Y$ , $X$ and $\beta_{0}$ are defined as earlier and the error term $u$ satisfies $E\left(u|X\right) = 0$ . The single index model takes its name from the parametric part of the model $x'\beta$ which is a scalar single index. The nonparametric part is the unknown function $g\left(\cdot\right)$ .

Ichimura's method

The single index model method developed by Ichimura (1993) is as follows. Consider the situation in which $y$ is continuous. Given a known form for the function $g\left(\cdot\right)$ , $\beta_{0}$ could be estimated using the nonlinear least squares method to minimize the function

\sum_{i=1} \left(Y_i - g\left(X'_i \beta\right)\right)^2.

Since the functional form of $g\left(\cdot\right)$ is not known, we need to estimate it. For a given value for $\beta$ an estimate of the function

G\left(X'_i \beta \right) = E\left(Y_i |X'_i \beta\right) = E\left[g\left(X'_i\beta_o \right)|X'_i \beta\right]

using kernel method. Ichimura (1993) proposes estimating $g\left(X'_{i}\beta\right)$ with

\hat{G}_{-i}\left(X'_i \beta\right),\,

the leave-one-out nonparametric kernel estimator of $G\left(X'_{i}\beta\right)$ .

Klein and Spady's estimator

If the dependent variable $y$ is binary and $X_{i}$ and $u_{i}$ are assumed to be independent, Klein and Spady (1993) propose a technique for estimating $\beta$ using maximum likelihood methods. The log-likelihood function is given by

L\left(\beta\right) = \sum_i \left(1-Y_i\right)\ln\left(1-\hat{g}_{-i}\left(X'_i\beta\right)\right) + \sum_{i}Y_i\ln\left(\hat{g}_{-i}\left(X'_i \beta\right)\right),

where $\hat{g}_{-i}\left(X'_{i}\beta\right)$ is the leave-one-out estimator.

Smooth coefficient/varying coefficient models

Hastie and Tibshirani (1993) propose a smooth coefficient model given by

Y_i = \alpha\left(Z_i\right) + X'_i\beta\left(Z_i\right) + u_i = \left(1 + X'_i\right)\left(\begin{array}{c} \alpha\left(Z_i\right) \\ \beta\left(Z_i\right) \end{array}\right) + u_i = W'_i\gamma\left(Z_i\right) + u_i,

where $X_{i}$ is a $k \times 1$ vector and $\beta\left(z\right)$ is a vector of unspecified smooth functions of $z$ .

$\gamma\left(\cdot\right)$ may be expressed as

\gamma\left(Z_i\right) = \left(E\left[W_i W'_i|Z_i \right]\right)^{-1}E\left[W_i Y_i|Z_i\right].

Notes

↑ See Li and Racine (2007) for an in-depth look at nonparametric regression methods.

References

Robinson, P.M. (1988). "Root-n Consistent Semiparametric Regression". Econometrica (The Econometric Society) 56 (4): 931–954. doi:10.2307/1912705. JSTOR 1912705.
Li, Qi; Racine, Jeffrey S. (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press. ISBN 0-691-12161-3.
Racine, J.S.; Qui, L. (2007). "A Partially Linear Kernel Estimator for Categorical Data". Unpublished Manuscript, Mcmaster University.
Ichimura, H. (1993). "Semiparametric Least Squares (SLS) and Weighted SLS Estimation of Single Index Models". Journal of Econometrics 58: 71–120. doi:10.1016/0304-4076(93)90114-K.
Klein, R. W.; R. H. Spady (1993). "An Efficient Semiparametric Estimator for Binary Response Models". Econometrica (The Econometric Society) 61 (2): 387–421. doi:10.2307/2951556. JSTOR 2951556.
Hastie, T.; R. Tibshirani (1993). "Varying-Coefficient Models". Journal of the Royal Statistical Society, Series B 55: 757–796.

Statistics

Descriptive statistics

Continuous data

Location	Mean arithmetic geometric harmonic Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Study design	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Confidence interval Testing hypotheses Power

Unbiased estimators	Mean unbiased minimum-variance Median unbiased

Biased estimators	Maximum likelihood Method of moments Minimum distance Density estimation

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F Shapiro–Wilk Kolmogorov–Smirnov

Goodness of fit	Chi-squared G Sample source (Anderson–Darling) Sample normality (Shapiro–Wilk) Skewness / kurtosis normality (Jarque-Bera) Model comparison (Likelihood-ratio) Model quality (Akaike criterion)

Signed-rank	1-sample (Wilcoxon) 2-sample (Mann–Whitney U) 1-way anova (Kruskal–Wallis)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the Thursday, October 15, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.