Author: Feng Zhang
Publisher: Stanford University
ISBN:
Category :
Languages : en
Pages : 91
Book Description
Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.
Cross-validation and Regression Analysis in High-dimensional Sparse Linear Models
Author: Feng Zhang
Publisher: Stanford University
ISBN:
Category :
Languages : en
Pages : 91
Book Description
Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.
Publisher: Stanford University
ISBN:
Category :
Languages : en
Pages : 91
Book Description
Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.
Machine Learning Techniques for Gait Biometric Recognition
Author: James Eric Mason
Publisher: Springer
ISBN: 3319290886
Category : Technology & Engineering
Languages : en
Pages : 247
Book Description
This book focuses on how machine learning techniques can be used to analyze and make use of one particular category of behavioral biometrics known as the gait biometric. A comprehensive Ground Reaction Force (GRF)-based Gait Biometrics Recognition framework is proposed and validated by experiments. In addition, an in-depth analysis of existing recognition techniques that are best suited for performing footstep GRF-based person recognition is also proposed, as well as a comparison of feature extractors, normalizers, and classifiers configurations that were never directly compared with one another in any previous GRF recognition research. Finally, a detailed theoretical overview of many existing machine learning techniques is presented, leading to a proposal of two novel data processing techniques developed specifically for the purpose of gait biometric recognition using GRF. This book · introduces novel machine-learning-based temporal normalization techniques · bridges research gaps concerning the effect of footwear and stepping speed on footstep GRF-based person recognition · provides detailed discussions of key research challenges and open research issues in gait biometrics recognition · compares biometrics systems trained and tested with the same footwear against those trained and tested with different footwear
Publisher: Springer
ISBN: 3319290886
Category : Technology & Engineering
Languages : en
Pages : 247
Book Description
This book focuses on how machine learning techniques can be used to analyze and make use of one particular category of behavioral biometrics known as the gait biometric. A comprehensive Ground Reaction Force (GRF)-based Gait Biometrics Recognition framework is proposed and validated by experiments. In addition, an in-depth analysis of existing recognition techniques that are best suited for performing footstep GRF-based person recognition is also proposed, as well as a comparison of feature extractors, normalizers, and classifiers configurations that were never directly compared with one another in any previous GRF recognition research. Finally, a detailed theoretical overview of many existing machine learning techniques is presented, leading to a proposal of two novel data processing techniques developed specifically for the purpose of gait biometric recognition using GRF. This book · introduces novel machine-learning-based temporal normalization techniques · bridges research gaps concerning the effect of footwear and stepping speed on footstep GRF-based person recognition · provides detailed discussions of key research challenges and open research issues in gait biometrics recognition · compares biometrics systems trained and tested with the same footwear against those trained and tested with different footwear
Statistical Learning with Sparsity
Author: Trevor Hastie
Publisher: CRC Press
ISBN: 1498712177
Category : Business & Economics
Languages : en
Pages : 354
Book Description
Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl
Publisher: CRC Press
ISBN: 1498712177
Category : Business & Economics
Languages : en
Pages : 354
Book Description
Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl
Partially Linear Models
Author: Wolfgang Härdle
Publisher: Springer Science & Business Media
ISBN: 3642577008
Category : Mathematics
Languages : en
Pages : 210
Book Description
In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.
Publisher: Springer Science & Business Media
ISBN: 3642577008
Category : Mathematics
Languages : en
Pages : 210
Book Description
In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.
Doctoral Research in Construction Management
Author: Zhen Chen
Publisher: Frontiers Media SA
ISBN: 2832515029
Category : Technology & Engineering
Languages : en
Pages : 156
Book Description
Publisher: Frontiers Media SA
ISBN: 2832515029
Category : Technology & Engineering
Languages : en
Pages : 156
Book Description
Data Science for Financial Econometrics
Author: Nguyen Ngoc Thach
Publisher: Springer Nature
ISBN: 3030488535
Category : Computers
Languages : en
Pages : 633
Book Description
This book offers an overview of state-of-the-art econometric techniques, with a special emphasis on financial econometrics. There is a major need for such techniques, since the traditional way of designing mathematical models – based on researchers’ insights – can no longer keep pace with the ever-increasing data flow. To catch up, many application areas have begun relying on data science, i.e., on techniques for extracting models from data, such as data mining, machine learning, and innovative statistics. In terms of capitalizing on data science, many application areas are way ahead of economics. To close this gap, the book provides examples of how data science techniques can be used in economics. Corresponding techniques range from almost traditional statistics to promising novel ideas such as quantum econometrics. Given its scope, the book will appeal to students and researchers interested in state-of-the-art developments, and to practitioners interested in using data science techniques.
Publisher: Springer Nature
ISBN: 3030488535
Category : Computers
Languages : en
Pages : 633
Book Description
This book offers an overview of state-of-the-art econometric techniques, with a special emphasis on financial econometrics. There is a major need for such techniques, since the traditional way of designing mathematical models – based on researchers’ insights – can no longer keep pace with the ever-increasing data flow. To catch up, many application areas have begun relying on data science, i.e., on techniques for extracting models from data, such as data mining, machine learning, and innovative statistics. In terms of capitalizing on data science, many application areas are way ahead of economics. To close this gap, the book provides examples of how data science techniques can be used in economics. Corresponding techniques range from almost traditional statistics to promising novel ideas such as quantum econometrics. Given its scope, the book will appeal to students and researchers interested in state-of-the-art developments, and to practitioners interested in using data science techniques.
Big and Complex Data Analysis
Author: S. Ejaz Ahmed
Publisher: Springer
ISBN: 3319415735
Category : Mathematics
Languages : en
Pages : 390
Book Description
This volume conveys some of the surprises, puzzles and success stories in high-dimensional and complex data analysis and related fields. Its peer-reviewed contributions showcase recent advances in variable selection, estimation and prediction strategies for a host of useful models, as well as essential new developments in the field. The continued and rapid advancement of modern technology now allows scientists to collect data of increasingly unprecedented size and complexity. Examples include epigenomic data, genomic data, proteomic data, high-resolution image data, high-frequency financial data, functional and longitudinal data, and network data. Simultaneous variable selection and estimation is one of the key statistical problems involved in analyzing such big and complex data. The purpose of this book is to stimulate research and foster interaction between researchers in the area of high-dimensional data analysis. More concretely, its goals are to: 1) highlight and expand the breadth of existing methods in big data and high-dimensional data analysis and their potential for the advancement of both the mathematical and statistical sciences; 2) identify important directions for future research in the theory of regularization methods, in algorithmic development, and in methodologies for different application areas; and 3) facilitate collaboration between theoretical and subject-specific researchers.
Publisher: Springer
ISBN: 3319415735
Category : Mathematics
Languages : en
Pages : 390
Book Description
This volume conveys some of the surprises, puzzles and success stories in high-dimensional and complex data analysis and related fields. Its peer-reviewed contributions showcase recent advances in variable selection, estimation and prediction strategies for a host of useful models, as well as essential new developments in the field. The continued and rapid advancement of modern technology now allows scientists to collect data of increasingly unprecedented size and complexity. Examples include epigenomic data, genomic data, proteomic data, high-resolution image data, high-frequency financial data, functional and longitudinal data, and network data. Simultaneous variable selection and estimation is one of the key statistical problems involved in analyzing such big and complex data. The purpose of this book is to stimulate research and foster interaction between researchers in the area of high-dimensional data analysis. More concretely, its goals are to: 1) highlight and expand the breadth of existing methods in big data and high-dimensional data analysis and their potential for the advancement of both the mathematical and statistical sciences; 2) identify important directions for future research in the theory of regularization methods, in algorithmic development, and in methodologies for different application areas; and 3) facilitate collaboration between theoretical and subject-specific researchers.
Statistics for High-Dimensional Data
Author: Peter Bühlmann
Publisher: Springer Science & Business Media
ISBN: 364220192X
Category : Mathematics
Languages : en
Pages : 568
Book Description
Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.
Publisher: Springer Science & Business Media
ISBN: 364220192X
Category : Mathematics
Languages : en
Pages : 568
Book Description
Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.
Robust Regression and Outlier Detection
Author: Peter J. Rousseeuw
Publisher: John Wiley & Sons
ISBN: 0471725374
Category : Mathematics
Languages : en
Pages : 329
Book Description
WILEY-INTERSCIENCE PAPERBACK SERIES The Wiley-Interscience Paperback Series consists of selectedbooks that have been made more accessible to consumers in an effortto increase global appeal and general circulation. With these newunabridged softcover volumes, Wiley hopes to extend the lives ofthese works by making them available to future generations ofstatisticians, mathematicians, and scientists. "The writing style is clear and informal, and much of thediscussion is oriented to application. In short, the book is akeeper." –Mathematical Geology "I would highly recommend the addition of this book to thelibraries of both students and professionals. It is a usefultextbook for the graduate student, because it emphasizes both thephilosophy and practice of robustness in regression settings, andit provides excellent examples of precise, logical proofs oftheorems. . . .Even for those who are familiar with robustness, thebook will be a good reference because it consolidates the researchin high-breakdown affine equivariant estimators and includes anextensive bibliography in robust regression, outlier diagnostics,and related methods. The aim of this book, the authors tell us, is‘to make robust regression available for everyday statisticalpractice.’ Rousseeuw and Leroy have included all of thenecessary ingredients to make this happen." –Journal of the American Statistical Association
Publisher: John Wiley & Sons
ISBN: 0471725374
Category : Mathematics
Languages : en
Pages : 329
Book Description
WILEY-INTERSCIENCE PAPERBACK SERIES The Wiley-Interscience Paperback Series consists of selectedbooks that have been made more accessible to consumers in an effortto increase global appeal and general circulation. With these newunabridged softcover volumes, Wiley hopes to extend the lives ofthese works by making them available to future generations ofstatisticians, mathematicians, and scientists. "The writing style is clear and informal, and much of thediscussion is oriented to application. In short, the book is akeeper." –Mathematical Geology "I would highly recommend the addition of this book to thelibraries of both students and professionals. It is a usefultextbook for the graduate student, because it emphasizes both thephilosophy and practice of robustness in regression settings, andit provides excellent examples of precise, logical proofs oftheorems. . . .Even for those who are familiar with robustness, thebook will be a good reference because it consolidates the researchin high-breakdown affine equivariant estimators and includes anextensive bibliography in robust regression, outlier diagnostics,and related methods. The aim of this book, the authors tell us, is‘to make robust regression available for everyday statisticalpractice.’ Rousseeuw and Leroy have included all of thenecessary ingredients to make this happen." –Journal of the American Statistical Association
Introduction to Machine Learning with Python
Author: Andreas C. Müller
Publisher: "O'Reilly Media, Inc."
ISBN: 1449369901
Category : Computers
Languages : en
Pages : 400
Book Description
Many Python developers are curious about what machine learning is and how it can be concretely applied to solve issues faced in businesses handling medium to large amount of data. Machine Learning with Python teaches you the basics of machine learning and provides a thorough hands-on understanding of the subject.You'll learn important machine learning concepts and algorithms, when to use them, and how to use them. The book will cover a machine learning workflow: data preprocessing and working with data, training algorithms, evaluating results, and implementing those algorithms into a production-level system.
Publisher: "O'Reilly Media, Inc."
ISBN: 1449369901
Category : Computers
Languages : en
Pages : 400
Book Description
Many Python developers are curious about what machine learning is and how it can be concretely applied to solve issues faced in businesses handling medium to large amount of data. Machine Learning with Python teaches you the basics of machine learning and provides a thorough hands-on understanding of the subject.You'll learn important machine learning concepts and algorithms, when to use them, and how to use them. The book will cover a machine learning workflow: data preprocessing and working with data, training algorithms, evaluating results, and implementing those algorithms into a production-level system.