Elastic net regularization

In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

Specification

The elastic net method overcomes the limitations of the LASSO (least absolute shrinkage and selection operator) method which uses a penalty function based on

\|\beta\|_1 = \textstyle \sum_{j=1}^p |\beta_j|.

Use of this penalty function has several limitations.[1] For example, in the "large p, small n" case (high-dimensional data with few examples), the LASSO selects at most n variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part to the penalty (\|\beta\|^2), which when used alone is ridge regression (known also as Tikhonov regularization). The estimates from the elastic net method are defined by

 \hat{\beta} = \underset{\beta}{\operatorname{argmin}} (\| y-X \beta \|^2 + \lambda_2 \|\beta\|^2 + \lambda_1 \|\beta\|_1) .

The quadratic penalty term makes the loss function strictly convex, and it therefore has a unique minimum. The elastic net method includes the LASSO and ridge regression: in other words, each of them is a special case where \lambda_1 = \lambda, \lambda_2 = 0 or \lambda_1 = 0, \lambda_2 = \lambda. Meanwhile, the naive version of elastic net method finds an estimator in a two-stage procedure : first for each fixed \lambda_2 it finds the ridge regression coefficients, and then does a LASSO type shrinkage. This kind of estimation incurs a double amount of shrinkage, which leads to increased bias and poor predictions. To improve the prediction performance, the authors rescale the coefficients of the naive version of elastic net by multiplying the estimated coefficients by (1 + \lambda_2).[1]

Examples of where the elastic net method has been applied are:

Reduction to Support Vector Machine

In late 2014, it has been proven that the Elastic Net can be reduced to the linear support vector machine.[5] A similar reduction has previously been proven for the LASSO in 2014 .[6] The authors show that for every instance of the Elastic Net, an artificial binary classification problem can be constructed such that the hyper-plane solution of a linear support vector machine (SVM) is identical to the solution \beta (after re-scaling). The reduction immediately enables the use of highly optimized SVM solvers for Elastic Net problems. It also enables the use of GPU acceleration, which is often already used for large-scale SVM solvers.[7] The reduction is a simple transformation of the original data and regularization constants

 X\in{\mathbb R}^{n\times p},y\in {\mathbb R}^n,\lambda_1\geq 0,\lambda_2\geq 0

into new artificial data instances and a regularization constant that specify a binary classification problem and the SVM regularization constant

 X_2\in{\mathbb R}^{2p\times n},y_2\in\{-1,1\}^{2p}, C\geq 0.

Here, y_2 consists of binary labels {-1,1}. When 2p>n it is typically faster to solve the linear SVM in the primal, whereas otherwise the dual formulation is faster. The authors refer to the transformation as Support Vector Elastic Net (SVEN), and provide the following Matlab pseudo-code:

function β=SVEN(X,y,t,λ2);
 [n,p]=size(X); 
 X2 = [bsxfun(@minus, X, y./t); bsxfun(@plus, X, y./t)];
 Y2=[ones(p,1);-ones(p,1)];
if 2p>n then 
 w = SVMPrimal(X2, Y2, C = 1/(2*λ2));
 α = C * max(1-Y2.*(X2*w),0); 
else
 α = SVMDual(X2, Y2, C = 1/(2*λ2)); 
end if
β = t * (α(1:p) - α(p+1:2p)) / sum(α);

Software

References

  1. 1 2 Zou, Hui; Hastie, Trevor (2005). "Regularization and Variable Selection via the Elastic Net". Journal of the Royal Statistical Society, Series B: 301–320.
  2. Wang, Li; Zhu, Ji; Zou, Hui. "The doubly regularized support vector machine" (PDF). Statistica Sinica 16: 589–615.
  3. Liu, Meizhu; Vemuri, Baba. "A robust and efficient doubly regularized metric learning approach". Proceedings of the 12th European conference on Computer Vision. Part IV: 646–659.
  4. Shen, Weiwei; Wang, Jun; Ma, Shiqian. "Doubly Regularized Portfolio with Risk Minimization". Twenty-Eighth AAAI Conference on Artificial Intelligence.
  5. Zhou, Quan; Chen, Wenlin; Song, Shiji; Gardner, Jacob; Weinberger, Kilian; Chen, Yixin. A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing. Association for the Advancement of Artificial Intelligence.
  6. Jaggi, Martin (2014). Suykens, Johan; Signoretto, Marco; Argyriou, Andreas, eds. An Equivalence between the Lasso and Support Vector Machines. Chapman and Hall/CRC.
  7. "GTSVM". uchicago.edu.
  8. Friedman, Jerome; Trevor Hastie; Rob Tibshirani (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent". Journal of Statistical Software: 1–22.
  9. "CRAN - Package glmnet". r-project.org.
  10. Waldron, L.; Pintilie, M.; Tsao, M. -S.; Shepherd, F. A.; Huttenhower, C.; Jurisica, I. (2011). "Optimized application of penalized regression methods to diverse genomic data". Bioinformatics 27 (24): 3399–3406. doi:10.1093/bioinformatics/btr591. PMC 3232376. PMID 22156367.
  11. "CRAN - Package pensim". r-project.org.
  12. "mlcircus / SVEN — Bitbucket". bitbucket.org.
  13. Sjöstrand, Karl; Clemmensen, Line; Einarsson, Gudmundur; Larsen, Rasmus; Ersbøll, Bjarne (2 February 2016). "SpaSM: A Matlab Toolbox for Sparse Statistical Modeling" (PDF). Journal of Statistical Software.

External links

This article is issued from Wikipedia - version of the Tuesday, March 08, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.