Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models PDF full book. Access full book title Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models by Toshio Honda. Download full books in PDF and EPUB format.

Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models

Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models PDF Author: Toshio Honda
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description


Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models

Forward Variable Selection for Ultra-high Dimensional Quantile Regression Models PDF Author: Toshio Honda
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description


Boosting Methods for Variable Selection in High Dimensional Sparse Models

Boosting Methods for Variable Selection in High Dimensional Sparse Models PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Firstly, we propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an extremely sparse high dimensional linear regression model. We exploit the fact that the LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. By carefully considering the relationship between estimators at successive stages, we develop fast algorithms to compute our estimators. The performance of our new estimators is compared with commonly used estimators in terms of predictive accuracy and errors in variable selection. It is observed that our approach has better prediction performance for highly sparse high dimensional linear regression models. Secondly, we propose a new variable selection technique for binary classification in high dimensional models based on a forward selection version of the Squared Support Vector Machines or one-norm Support Vector Machines, to be called as forward iterative selection and classification algorithm (FISCAL). This methods seem to work better for a highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector machines are convex and differentiable except at zero when the predictor is one-dimensional. Then an iterative forward selection approach is applied along with the squared support vector machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for the FISCAL to save computational burdens. We apply the processes to the original onenorm Support Vector Machines. We compare the FISCAL with other widely used.

Forward Variable Selection for Sparse Ultra-high Dimensional Generalized Varying Coefficient Models

Forward Variable Selection for Sparse Ultra-high Dimensional Generalized Varying Coefficient Models PDF Author: Toshio Honda
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description


Ultra High Dimension Variable Selection with Threshold Partial Correlations

Ultra High Dimension Variable Selection with Threshold Partial Correlations PDF Author: Yiheng Liu
Publisher:
ISBN:
Category : Regression analysis
Languages : en
Pages : 0

Book Description
With respect to variable selection in linear regression, partial correlation for normal models (Buhlmann, Kalisch and Maathuis, 2010), was a powerful alternative method to penalized least squares approaches (LASSO, SCAD, etc.). The method was improved by Li, Liu, Lou (2015) with the concept of threshold partial correlation (TPC) and extension to elliptical contoured dis- tributions. The TPC procedure is endowed with its dominant advantages over the simple partial correlation in high or ultrahigh dimensional cases (where the dimension of predictors increases in an exponential rate of the sample size). However, the convergence rate for TPC is not very satis- fying since it usually takes substantial amount of time for the procedure to reach the final solution, especially in high or even ultrahigh dimensional scenarios. Besides, the model assumptions on the TPC are too strong, which suggest the approach might not be conveniently used in practice. To address these two important issues, this dissertation puts forward an innovative model selection al- gorithm. It starts with an alternative definition of elliptical contoured distributions, which restricts the impact of the marginal kurtosis. This posts a relatively weaker condition for the validity of the model selection algorithm. Based on the simulation results, the new approach demonstrates not only competitive outcomes with established methods such as LASSO and SCAD, but also advan- tages in terms of computing efficiency. The idea of the algorithm is extended to survival data and nonparametric inference by exploring various measurements on correlations between the response variable and predictors.

Prediction and Variable Selection in Sparse Ultrahigh Dimensional Additive Models

Prediction and Variable Selection in Sparse Ultrahigh Dimensional Additive Models PDF Author: Girly Manguba Ramirez
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
The advance in technologies has enabled many fields to collect datasets where the number of covariates (p) tends to be much bigger than the number of observations (n), the so-called ultrahigh dimensionality. In this setting, classical regression methodologies are invalid. There is a great need to develop methods that can explain the variations of the response variable using only a parsimonious set of covariates. In the recent years, there have been significant developments of variable selection procedures. However, these available procedures usually result in the selection of too many false variables. In addition, most of the available procedures are appropriate only when the response variable is linearly associated with the covariates. Motivated by these concerns, we propose another procedure for variable selection in ultrahigh dimensional setting which has the ability to reduce the number of false positive variables. Moreover, this procedure can be applied when the response variable is continuous or binary, and when the response variable is linearly or non-linearly related to the covariates. Inspired by the Least Angle Regression approach, we develop two multi-step algorithms to select variables in sparse ultrahigh dimensional additive models. The variables go through a series of nonlinear dependence evaluation following a Most Significant Regression (MSR) algorithm. In addition, the MSR algorithm is also designed to implement prediction of the response variable. The first algorithm called MSR-continuous (MSRc) is appropriate for a dataset with a response variable that is continuous. Simulation results demonstrate that this algorithm works well. Comparisons with other methods such as greedy-INIS by Fan et al. (2011) and generalized correlation procedure by Hall and Miller (2009) showed that MSRc not only has false positive rate that is significantly less than both methods, but also has accuracy and true positive rate comparable with greedy-INIS. The second algorithm called MSR-binary (MSRb) is appropriate when the response variable is binary. Simulations demonstrate that MSRb is competitive in terms of prediction accuracy and true positive rate, and better than GLMNET in terms of false positive rate. Application of MSRb to real datasets is also presented. In general, MSR algorithm usually selects fewer variables while preserving the accuracy of predictions.

Semiparametric Quantile Averaging in the Presence of High-Dimensional Predictors

Semiparametric Quantile Averaging in the Presence of High-Dimensional Predictors PDF Author: Jan G. De Gooijer
Publisher:
ISBN:
Category :
Languages : en
Pages : 37

Book Description
The paper proposes a method for forecasting conditional quantiles. In practice, one often does not know the "true" structure of the underlying conditional quantile function. In addition, we may have a potentially large number of the predictors. Mainly intended for such cases, we introduce a flexible and practical framework based on penalized high-dimensional quantile averaging. In addition to prediction, we show that the proposed method can also serve as a valid predictor selector. We conduct extensive simulation experiments to asses its prediction and variable selection performances for nonlinear and linear model designs. In terms of predictor selection, the approach tends to select the true set of predictors with minimal false positives. With respect to prediction accuracy, the method competes well even with those benchmark/oracle methods that know one or more aspects of the underlying quantile regression model. To further illustrate the merit of the proposed method, we provide an application to the out-of-sample forecasting U.S. core inflation using a large set of monthly macroeconomic variables based on the recently developed FRED-MD database. The application offers several empirical findings.

Statistical Foundations of Data Science

Statistical Foundations of Data Science PDF Author: Jianqing Fan
Publisher: CRC Press
ISBN: 0429527616
Category : Mathematics
Languages : en
Pages : 942

Book Description
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

A Non-iterative Method for Fitting the Single Index Quantile Regression Model with Uncensored and Censored Data

A Non-iterative Method for Fitting the Single Index Quantile Regression Model with Uncensored and Censored Data PDF Author: Eliana Christou
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Quantile regression (QR) is becoming increasingly popular due to its relevance in many scientific investigations. Linear and nonlinear QR models have been studied extensively, while recent research focuses on the single index quantile regression (SIQR) model. Compared to the single index mean regression (SIMR) problem, the fitting and the asymptotic theory of the SIQR model are more complicated due to the lack of closed form expressions for estimators of conditional quantiles. Consequently, existing methods are necessarily iterative. We propose a non-iterative estimation algorithm, and derive the asymptotic distribution of the proposed estimator under heteroscedasticity. For identifiability, we use a parametrization that sets the first coefficient to 1 instead of the typical condition which restricts the norm of the parametric component. This distinction is more than simply cosmetic as it affects, in a critical way, the correspondence between the estimator derived and the asymptotic theory. The ubiquity of high dimensional data has led to a number of variable selection methods for linear/nonlinear QR models and, recently, for the SIQR model. We propose a new algorithm for simultaneous variable selection and parameter estimation applicable also for heteroscedastic data. The proposed algorithm, which is non-iterative, consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and, using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method of Step 1 consistently estimates the relevant variables, and that the estimated parametric component derived in Step 2 satisfies the oracle property. Furthermore, QR is particularly relevant for the analysis of censored survival data as an alternative to proportional hazards and the accelerated failure time models. Such data occur frequently in biostatistics, environmental sciences, social sciences and econometrics. There is a large body of work for linear/nonlinear QR models for censored data, but it is only recently that the SIQR model has received some attention. However, the only existing method for fitting the SIQR model uses an iterative algorithm and no asymptotic theory for the resulting estimator of the Euclidean parameter is given. We propose a new non-iterative estimation algorithm, and derive the asymptotic distribution of the proposed estimator under heteroscedasticity.

Computation in Quantile and Composite Quantile Regression Models with Or Without Regularization

Computation in Quantile and Composite Quantile Regression Models with Or Without Regularization PDF Author: Jueyu Gao
Publisher:
ISBN:
Category : Analytic functions
Languages : en
Pages : 55

Book Description
Quantile, composite quantile regression with or without regularization have been widely studied and applied in the high-dimensional model estimation and variable selections. Although the theoretical aspect has been well established, the lack of efficient computation methods and publicly available programs or packages hinder the research in this area. Koenker has established and implemented the interior point(IP) method in quantreg for quantile regression with or without regularization. However, it still lacks the ability to handle the composite quantile regression with or without regularization. The same incapability also existed in Coordinate Descent (CD) algorithm that has been implemented in CDLasso. The lack of handful programs for composite quantile regression with or without regularization motivates our research here. In this work, we implement three different algorithms including Majorize and Minimize(MM), Coordinate Descent(CD) and Alternation Direction Method of Multiplier(ADMM) for quantile and composite quantile regression with or without regularization. We conduct the simulation that compares the performance of four algorithms in time efficiency and estimation accuracy. The simulation study shows our program is time efficient when dealing with high dimensional problems. Based on the good performance of our program, we publish the R package cqrReg, which give the user more flexibility and capability when directing various data analyses. In order to optimize the time efficiency, the package cqrReg is coded in C++ and linked back to R by an user-friendly interface.

Two Tales of Variable Selection for High Dimensional Data

Two Tales of Variable Selection for High Dimensional Data PDF Author: Cong Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 95

Book Description
We also conduct similar types of studies for comparison of two corresponding screening and selection procedures of LASSO and correlation screening in classification setting, i.e., $L_{1}$ penalized logistic regression and two-sample t-test. Initial results of exploratory analysis are presented to provide some insights on the preferred scenarios of the two methods respectively. Discussions are made on possible extensions, future works and difference between regression and classification setting.