TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION PDF full book. Access full book title TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION by Mohammed A. Yousef. Download full books in PDF and EPUB format.

TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION

TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION PDF Author: Mohammed A. Yousef
Publisher:
ISBN:
Category : Linear models (Statistics)
Languages : en
Pages : 116

Book Description
Linear regression model is the classical approach to explain the relationship between the response variable (dependent) and predictors (independent). However, when the number of predictors in the data increases, the likelihood of the correlation between predictors also increases, which is problematic. To avoid that, the linear mixed effects model was proposed which consists of a fixed effects term and a random effects term. The fixed effects term represents the traditional linear regression coefficients, and the random effects term represents the values that are drawn randomly from the population. Thus, the linear mixed model allows us to represent the mean as well as the covariance structure of the data in a single model. When the fixed and random effects terms increase in their dimensions, selection as appropriate model, which is the optimum fit, becomes increasingly difficult. Due to this natural complexity inherent in the linear mixed model, in this dissertation we propose a two-stage method for selecting fixed and random effects terms. In the first stage, we select the most significant fixed effects in the model based on the conditional distribution of the response variable given the random effects. This is achieved by minimizing the penalized least square estimator with a SCAD Lasso penalty term. We used the Newton-Raphson optimization algorithm to implement the parameter estimations. In this process, the coefficients of the unimportant predictors shrink towards exactly zero, thus eliminating the noise from the model. Subsequently, in the second stage we choose the most important random effects by maximizing the penalized profile log-likelihood function. This maximization is achieved using the Newton-Raphson optimization algorithm. As in the first stage, the penalty term appended is SCAD Lasso. Unlike the fixed effects, the random effects are drawn randomly from the population; hence, they need to be predicted. This prediction is done by estimating the diagonal elements (variances) of the covariance structure of the random effects. Note that during this step, for all random effects that are unimportant, the corresponding variance components will shrink to exactly zero (similar to the shrinking of fixed effects parameters in the first stage). This is how noise is eliminated from the model while retaining only significant effects. Hence, the selection of the random effects is completed. In both stages of the proposed approach, it is shown that the selection of the effects through elimination is done with the probability tending to one. It is indicative that the proposed method surely identifies all true effects, fixed as well as random. Also, it is shown that the proposed method satisfies the oracle properties, namely asymptotic normality and sparsity. At the end of these two stages, we have the optimal linear mixed model which can be readily applied to correlated data. To test the overall effectiveness of the proposed approach, four simulation studies are conducted. Each scenario has a different number of subjects, different observations per subject, and different covariance structures on which the data are generated. The simulation results illustrate that the proposed method can effectively select the fixed effects and random effects in the linear mixed model. In the simulations, the proposed method is also compared with other model selection methods, and the simulation results make it manifest that the proposed method performs better in choosing the true model. Subsequently, two applications, Amsterdam growth and health study data (Kemper, 1995) and Messier 69 data-Astronomy application (Husband, 2017), are utilized to investigate how the proposed approach behaves with the real-life data. In both applications, the proposed method is compared with other methods. The proposed method proves to be more effective than its counterparts in identifying the appropriate mixed model.

TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION

TWO-STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION PDF Author: Mohammed A. Yousef
Publisher:
ISBN:
Category : Linear models (Statistics)
Languages : en
Pages : 116

Book Description
Linear regression model is the classical approach to explain the relationship between the response variable (dependent) and predictors (independent). However, when the number of predictors in the data increases, the likelihood of the correlation between predictors also increases, which is problematic. To avoid that, the linear mixed effects model was proposed which consists of a fixed effects term and a random effects term. The fixed effects term represents the traditional linear regression coefficients, and the random effects term represents the values that are drawn randomly from the population. Thus, the linear mixed model allows us to represent the mean as well as the covariance structure of the data in a single model. When the fixed and random effects terms increase in their dimensions, selection as appropriate model, which is the optimum fit, becomes increasingly difficult. Due to this natural complexity inherent in the linear mixed model, in this dissertation we propose a two-stage method for selecting fixed and random effects terms. In the first stage, we select the most significant fixed effects in the model based on the conditional distribution of the response variable given the random effects. This is achieved by minimizing the penalized least square estimator with a SCAD Lasso penalty term. We used the Newton-Raphson optimization algorithm to implement the parameter estimations. In this process, the coefficients of the unimportant predictors shrink towards exactly zero, thus eliminating the noise from the model. Subsequently, in the second stage we choose the most important random effects by maximizing the penalized profile log-likelihood function. This maximization is achieved using the Newton-Raphson optimization algorithm. As in the first stage, the penalty term appended is SCAD Lasso. Unlike the fixed effects, the random effects are drawn randomly from the population; hence, they need to be predicted. This prediction is done by estimating the diagonal elements (variances) of the covariance structure of the random effects. Note that during this step, for all random effects that are unimportant, the corresponding variance components will shrink to exactly zero (similar to the shrinking of fixed effects parameters in the first stage). This is how noise is eliminated from the model while retaining only significant effects. Hence, the selection of the random effects is completed. In both stages of the proposed approach, it is shown that the selection of the effects through elimination is done with the probability tending to one. It is indicative that the proposed method surely identifies all true effects, fixed as well as random. Also, it is shown that the proposed method satisfies the oracle properties, namely asymptotic normality and sparsity. At the end of these two stages, we have the optimal linear mixed model which can be readily applied to correlated data. To test the overall effectiveness of the proposed approach, four simulation studies are conducted. Each scenario has a different number of subjects, different observations per subject, and different covariance structures on which the data are generated. The simulation results illustrate that the proposed method can effectively select the fixed effects and random effects in the linear mixed model. In the simulations, the proposed method is also compared with other model selection methods, and the simulation results make it manifest that the proposed method performs better in choosing the true model. Subsequently, two applications, Amsterdam growth and health study data (Kemper, 1995) and Messier 69 data-Astronomy application (Husband, 2017), are utilized to investigate how the proposed approach behaves with the real-life data. In both applications, the proposed method is compared with other methods. The proposed method proves to be more effective than its counterparts in identifying the appropriate mixed model.

Shrinkage Parameter Selection in Generalized Linear and Mixed Models

Shrinkage Parameter Selection in Generalized Linear and Mixed Models PDF Author: Erin K. Melcon
Publisher:
ISBN: 9781321363388
Category :
Languages : en
Pages :

Book Description
Penalized likelihood methods such as lasso, adaptive lasso, and SCAD have been highly utilized in linear models. Selection of the penalty parameter is an important step in modeling with penalized techniques. Traditionally, information criteria or cross validation are used to select the penalty parameter. Although methods of selecting this have been evaluated in linear models, general linear models and linear mixed models have not been so thoroughly explored.This dissertation will introduce a data-driven bootstrap (Empirical Optimal Selection, or EOS) approach for selecting the penalty parameter with a focus on model selection. We implement EOS on selecting the penalty parameter in the case of lasso and adaptive lasso. In generalized linear models we will introduce the method, show simulations comparing EOS to information criteria and cross validation, and give theoretical justification for this approach. We also consider a practical upper bound for the penalty parameter, with theoretical justification. In linear mixed models, we use EOS with two different objective functions; the traditional log-likelihood approach (which requires an EM algorithm), and a predictive approach. In both of these cases, we compare selecting the penalty parameter with EOS to selection with information criteria. Theoretical justification for both objective functions and a practical upper bound for the penalty parameter in the log-likelihood case are given. We also applied our technique to two datasets; the South African heart data (logistic regression) and the Yale infant data (a linear mixed model). For the South African data, we compare the final models using EOS and information criteria via the mean squared prediction error (MSPE). For the Yale infant data, we compare our results to those obtained by Ibrahim et al. (2011).

Statistical Inference from High Dimensional Data

Statistical Inference from High Dimensional Data PDF Author: Carlos Fernandez-Lozano
Publisher: MDPI
ISBN: 3036509445
Category : Science
Languages : en
Pages : 314

Book Description
• Real-world problems can be high-dimensional, complex, and noisy • More data does not imply more information • Different approaches deal with the so-called curse of dimensionality to reduce irrelevant information • A process with multidimensional information is not necessarily easy to interpret nor process • In some real-world applications, the number of elements of a class is clearly lower than the other. The models tend to assume that the importance of the analysis belongs to the majority class and this is not usually the truth • The analysis of complex diseases such as cancer are focused on more-than-one dimensional omic data • The increasing amount of data thanks to the reduction of cost of the high-throughput experiments opens up a new era for integrative data-driven approaches • Entropy-based approaches are of interest to reduce the dimensionality of high-dimensional data

Novel Approaches in Microbiome Analyses and Data Visualization

Novel Approaches in Microbiome Analyses and Data Visualization PDF Author: Jessica Galloway-Peña
Publisher: Frontiers Media SA
ISBN: 2889456536
Category :
Languages : en
Pages : 186

Book Description
High-throughput sequencing technologies are widely used to study microbial ecology across species and habitats in order to understand the impacts of microbial communities on host health, metabolism, and the environment. Due to the dynamic nature of microbial communities, longitudinal microbiome analyses play an essential role in these types of investigations. Key questions in microbiome studies aim at identifying specific microbial taxa, enterotypes, genes, or metabolites associated with specific outcomes, as well as potential factors that influence microbial communities. However, the characteristics of microbiome data, such as sparsity and skewedness, combined with the nature of data collection, reflected often as uneven sampling or missing data, make commonly employed statistical approaches to handle repeated measures in longitudinal studies inadequate. Therefore, many researchers have begun to investigate methods that could improve incorporating these features when studying clinical, host, metabolic, or environmental associations with longitudinal microbiome data. In addition to the inferential aspect, it is also becoming apparent that visualization of high dimensional data in a way which is both intelligible and comprehensive is another difficult challenge that microbiome researchers face. Visualization is crucial in both the analysis and understanding of metagenomic data. Researchers must create clear graphic representations that give biological insight without being overly complicated. Thus, this Research Topic seeks to both review and provide novels approaches that are being developed to integrate microbiome data and complex metadata into meaningful mathematical, statistical and computational models. We believe this topic is fundamental to understanding the importance of microbial communities and provides a useful reference for other investigators approaching the field.

Clinical Medicine for Healthcare and Sustainability

Clinical Medicine for Healthcare and Sustainability PDF Author: Teen-­Hang Meen
Publisher: MDPI
ISBN: 3039368621
Category : Science
Languages : en
Pages : 434

Book Description
When the domestic government, the private sector, and people in various professional fields talk about long-term care issues, they all focus on creating a warm and home-like care institution. However, we actively emphasize the importance of community-based long-term care. For “aging in place”, the development of domestic non-institutional care is still in its infancy, and some long-term care needs must still be met through institutional care, and the facilitation of the extension or outreach of community-based care and respite service platforms for the development of community-based long-term care still rely on institutional care. The history of the development of long-term care in Taiwan is much shorter than that of Japan, Europe, the United States, and Canada. Despite years of hard work and rapid development, the long-term care resources needed to establish a complete system in terms of universalization, fairness, accessibility, and selectivity are not available. In the future, based on the soundness of institutional care, it hoped that outreach will move toward the goals of community care and aging in place. We hope the studies in this Special Issue will help further develop clinical medicine for healthcare and stainability.

Integrative Analysis of Genome-Wide Association Studies and Single-Cell Sequencing Studies

Integrative Analysis of Genome-Wide Association Studies and Single-Cell Sequencing Studies PDF Author: Sheng Yang
Publisher: Frontiers Media SA
ISBN: 2889714675
Category : Science
Languages : en
Pages : 113

Book Description


Multivariate Statistical Modelling Based on Generalized Linear Models

Multivariate Statistical Modelling Based on Generalized Linear Models PDF Author: Ludwig Fahrmeir
Publisher: Springer Science & Business Media
ISBN: 1489900101
Category : Mathematics
Languages : en
Pages : 440

Book Description
Concerned with the use of generalised linear models for univariate and multivariate regression analysis, this is a detailed introductory survey of the subject, based on the analysis of real data drawn from a variety of subjects such as the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account.

Statistical Learning with Sparsity

Statistical Learning with Sparsity PDF Author: Trevor Hastie
Publisher: CRC Press
ISBN: 1498712177
Category : Business & Economics
Languages : en
Pages : 354

Book Description
Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl

Statistical Foundations of Data Science

Statistical Foundations of Data Science PDF Author: Jianqing Fan
Publisher: CRC Press
ISBN: 0429527616
Category : Mathematics
Languages : en
Pages : 942

Book Description
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Gaussian Process Regression Analysis for Functional Data

Gaussian Process Regression Analysis for Functional Data PDF Author: Jian Qing Shi
Publisher: CRC Press
ISBN: 1439837740
Category : Mathematics
Languages : en
Pages : 214

Book Description
Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Coveri