Algorithms for Solving Statistical Model Selection Problems PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Algorithms for Solving Statistical Model Selection Problems PDF full book. Access full book title Algorithms for Solving Statistical Model Selection Problems by Cristian Gatu. Download full books in PDF and EPUB format.

Algorithms for Solving Statistical Model Selection Problems

Author: Cristian Gatu
Publisher:
ISBN:
Category :
Languages : en
Pages : 110

Book Description

Algorithms for Solving Statistical Model Selection Problems

Author: Cristian Gatu
Publisher:
ISBN:
Category :
Languages : en
Pages : 110

Book Description

Algorithms for Solving Statistical Model Selection Problems

Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Statistical model selection problems arises in diverse areas. Some of the selection methods have exponential complexities and thus, are computationally demanding. The purpose of this thesis is to propose computationally efficient and numerical reliable algorithms used in statistical model selection. Particular emphasis is given to the computationally intensive model selection strategies which evaluate regression trees and have combinatorial solutions. The computational efficiency of the proposed algorithms has been investigated by detailed complexity analysis. Parallel algorithms to compute all possible subset regression models are designed, implemented and analyzed. A branch-and-bound strategy that computes the best-subset regression models corresponding to each number of variables is proposed. A heuristic version of this strategy is developed. It is based on a tolerance parameter when deciding to cut a subtree. Experimental results which support the theoretical results of the new strategies are shown. The adaptation of the various regression tree strategies to subset Vector Autoregressive model selection problems is pursued. Various special cases for subset selection which exploit the common columns of the data matrices and the Kronecker structure of the variance-covariance matrix are investigated. Within this context, the design of a combinatorial algorithm to compute efficiently the estimators of a seemingly unrelated regressions model with permuted exogenous data matrices is designed. The algorithms developed in this thesis have as a main computational component the QR decomposition and its modification. Efficient strategies to compute the various matrix factorization problems which arise in the estimation procedures are designed. The non-dense structure of the matrices is exploited, Kronecker products are not explicitely computed and computation of matrix inverses is avoided.

Selecting Models from Data

Author: P. Cheeseman
Publisher: Springer Science & Business Media
ISBN: 1461226600
Category : Mathematics
Languages : en
Pages : 475

Book Description
This volume is a selection of papers presented at the Fourth International Workshop on Artificial Intelligence and Statistics held in January 1993. These biennial workshops have succeeded in bringing together researchers from Artificial Intelligence and from Statistics to discuss problems of mutual interest. The exchange has broadened research in both fields and has strongly encour aged interdisciplinary work. The theme ofthe 1993 AI and Statistics workshop was: "Selecting Models from Data". The papers in this volume attest to the diversity of approaches to model selection and to the ubiquity of the problem. Both statistics and artificial intelligence have independently developed approaches to model selection and the corresponding algorithms to implement them. But as these papers make clear, there is a high degree of overlap between the different approaches. In particular, there is agreement that the fundamental problem is the avoidence of "overfitting"-Le., where a model fits the given data very closely, but is a poor predictor for new data; in other words, the model has partly fitted the "noise" in the original data.

Statistical Foundations of Data Science

Author: Jianqing Fan
Publisher: CRC Press
ISBN: 1466510854
Category : Mathematics
Languages : en
Pages : 752

Book Description
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Computational Subset Model Selection Algorithms and Applications

Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
This dissertation develops new computationally efficient algorithms for identifying the subset of variables that minimizes any desired information criteria in model selection. In recent years, the statistical literature has placed more and more emphasis on information theoretic model selection criteria. A model selection criterion chooses model that "closely" approximates the true underlying model. Recent years have also seen many exciting developments in the model selection techniques. As demand increases for data mining of massive datasets with many variables, the demand for model selection techniques are becoming much stronger and needed. To this end, we introduce a new Implicit Enumeration (IE) algorithm and a hybridized IE with the Genetic Algorithm (GA) in this dissertation. The proposed Implicit Enumeration algorithm is the first algorithm that explicitly uses an information criterion as the objective function. The algorithm works with a variety of information criteria including some for which the existing branch and bound algorithms developed by Furnival and Wilson (1974) and Gatu and Kontoghiorghies (2003) are not applicable. It also finds the "best" subset model directly without the need of finding the "best" subset of each size as the branch and bound techniques do. The proposed methods are demonstrated in multiple, multivariate, logistic regression and discriminant analysis problems. The implicit enumeration algorithm converged to the optimal solution on real and simulated data sets with up to 80 predictors, thus having 280 = 1,208,925,819,614,630,000,000,000 possible subset models in the model portfolio. To our knowledge, none of the existing exact algorithms have the capability of optimally solving such problems of this size.

Selecting Models from Data

Author: P Cheeseman
Publisher:
ISBN: 9781461226611
Category : Artificial intelligence
Languages : en
Pages : 504

Book Description
This volume presents a selection of papers from the Fourth International Workshop on Artificial Intelligence and Statistics. This biennial workshop brings together researchers from both fields to discuss problems of mutual interest and to compare approaches to their solution. The fourth workshop focused on the topic of selecting models from data. As the papers in this volume attest, the empirical approaches from the two separate fields have much in common yet still depart enough from one another to stimulate active interdisciplinary work. The papers cover a wide spectrum of problems in empirical modelling including model selection in general, graphical models, causal models, regression and other statistical models, and general algorithms and software tools. This timely volume will benefit all researchers with an active interest in model selection, empirical model building, or more generally the interaction between Statistics and Artificial Intelligence.

Statistical Selection Among Problem-solving Methods

Author: Carnegie Mellon University. Computer Science Department
Publisher:
ISBN:
Category : Artificial intelligence
Languages : en
Pages : 0

Book Description
Abstract: "The choice of an appropriate problem-solving method, from available methods, is a crucial skill for human experts in many areas. We describe a technique for automatic selection among methods, based on a statistical analysis of their past performances. We formalize the statistical problem involved in selecting an efficient problem-solving method, derive a solution to this problem, and describe a selection algorithm. The algorithm not only chooses among available methods, but also decides when to abandon the chosen method, if it proves to take too much time. We extend our basic statistical technique to account for problem sizes and for similarity between problems. We give empirical results of the use of this technique to select among search engines in the PRODIGY system. We also test the selection technique on artificially generated performance data, using several different probability distributions."

Quantitative Medical Data Analysis Using Mathematical Tools And Statistical Techniques

Author: Don Hong
Publisher: World Scientific
ISBN: 9814476234
Category : Medical
Languages : en
Pages : 364

Book Description
Quantitative biomedical data analysis is a fast-growing interdisciplinary area of applied and computational mathematics, statistics, computer science, and biomedical science, leading to new fields such as bioinformatics, biomathematics, and biostatistics. In addition to traditional statistical techniques and mathematical models using differential equations, new developments with a very broad spectrum of applications, such as wavelets, spline functions, curve and surface subdivisions, sampling, and learning theory, have found their mathematical home in biomedical data analysis.This book gives a new and integrated introduction to quantitative medical data analysis from the viewpoint of biomathematicians, biostatisticians, and bioinformaticians. It offers a definitive resource to bridge the disciplines of mathematics, statistics, and biomedical sciences. Topics include mathematical models for cancer invasion and clinical sciences, data mining techniques and subset selection in data analysis, survival data analysis and survival models for cancer patients, statistical analysis and neural network techniques for genomic and proteomic data analysis, wavelet and spline applications for mass spectrometry data preprocessing and statistical computing.

Handbook of Graphs and Networks

Author: Stefan Bornholdt
Publisher: John Wiley & Sons
ISBN: 3527606335
Category : Science
Languages : en
Pages : 417

Book Description
Complex interacting networks are observed in systems from such diverse areas as physics, biology, economics, ecology, and computer science. For example, economic or social interactions often organize themselves in complex network structures. Similar phenomena are observed in traffic flow and in communication networks as the internet. In current problems of the Biosciences, prominent examples are protein networks in the living cell, as well as molecular networks in the genome. On larger scales one finds networks of cells as in neural networks, up to the scale of organisms in ecological food webs. This book defines the field of complex interacting networks in its infancy and presents the dynamics of networks and their structure as a key concept across disciplines. The contributions present common underlying principles of network dynamics and their theoretical description and are of interest to specialists as well as to the non-specialized reader looking for an introduction to this new exciting field. Theoretical concepts include modeling networks as dynamical systems with numerical methods and new graph theoretical methods, but also focus on networks that change their topology as in morphogenesis and self-organization. The authors offer concepts to model network structures and dynamics, focussing on approaches applicable across disciplines.

Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data

Author: Syed Ejaz Ahmed
Publisher: CRC Press
ISBN: 1000876659
Category : Business & Economics
Languages : en
Pages : 409

Book Description
This book presents some post-estimation and predictions strategies for the host of useful statistical models with applications in data science. It combines statistical learning and machine learning techniques in a unique and optimal way. It is well-known that machine learning methods are subject to many issues relating to bias, and consequently the mean squared error and prediction error may explode. For this reason, we suggest shrinkage strategies to control the bias by combining a submodel selected by a penalized method with a model with many features. Further, the suggested shrinkage methodology can be successfully implemented for high dimensional data analysis. Many researchers in statistics and medical sciences work with big data. They need to analyse this data through statistical modelling. Estimating the model parameters accurately is an important part of the data analysis. This book may be a repository for developing improve estimation strategies for statisticians. This book will help researchers and practitioners for their teaching and advanced research, and is an excellent textbook for advanced undergraduate and graduate courses involving shrinkage, statistical, and machine learning. The book succinctly reveals the bias inherited in machine learning method and successfully provides tools, tricks and tips to deal with the bias issue. Expertly sheds light on the fundamental reasoning for model selection and post estimation using shrinkage and related strategies. This presentation is fundamental, because shrinkage and other methods appropriate for model selection and estimation problems and there is a growing interest in this area to fill the gap between competitive strategies. Application of these strategies to real life data set from many walks of life. Analytical results are fully corroborated by numerical work and numerous worked examples are included in each chapter with numerous graphs for data visualization. The presentation and style of the book clearly makes it accessible to a broad audience. It offers rich, concise expositions of each strategy and clearly describes how to use each estimation strategy for the problem at hand. This book emphasizes that statistics/statisticians can play a dominant role in solving Big Data problems, and will put them on the precipice of scientific discovery. The book contributes novel methodologies for HDDA and will open a door for continued research in this hot area. The practical impact of the proposed work stems from wide applications. The developed computational packages will aid in analyzing a broad range of applications in many walks of life.