Computational Subset Model Selection Algorithms and Applications PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Computational Subset Model Selection Algorithms and Applications PDF full book. Access full book title Computational Subset Model Selection Algorithms and Applications by . Download full books in PDF and EPUB format.

Computational Subset Model Selection Algorithms and Applications

Computational Subset Model Selection Algorithms and Applications PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
This dissertation develops new computationally efficient algorithms for identifying the subset of variables that minimizes any desired information criteria in model selection. In recent years, the statistical literature has placed more and more emphasis on information theoretic model selection criteria. A model selection criterion chooses model that "closely" approximates the true underlying model. Recent years have also seen many exciting developments in the model selection techniques. As demand increases for data mining of massive datasets with many variables, the demand for model selection techniques are becoming much stronger and needed. To this end, we introduce a new Implicit Enumeration (IE) algorithm and a hybridized IE with the Genetic Algorithm (GA) in this dissertation. The proposed Implicit Enumeration algorithm is the first algorithm that explicitly uses an information criterion as the objective function. The algorithm works with a variety of information criteria including some for which the existing branch and bound algorithms developed by Furnival and Wilson (1974) and Gatu and Kontoghiorghies (2003) are not applicable. It also finds the "best" subset model directly without the need of finding the "best" subset of each size as the branch and bound techniques do. The proposed methods are demonstrated in multiple, multivariate, logistic regression and discriminant analysis problems. The implicit enumeration algorithm converged to the optimal solution on real and simulated data sets with up to 80 predictors, thus having 280 = 1,208,925,819,614,630,000,000,000 possible subset models in the model portfolio. To our knowledge, none of the existing exact algorithms have the capability of optimally solving such problems of this size.

Computational Subset Model Selection Algorithms and Applications

Computational Subset Model Selection Algorithms and Applications PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
This dissertation develops new computationally efficient algorithms for identifying the subset of variables that minimizes any desired information criteria in model selection. In recent years, the statistical literature has placed more and more emphasis on information theoretic model selection criteria. A model selection criterion chooses model that "closely" approximates the true underlying model. Recent years have also seen many exciting developments in the model selection techniques. As demand increases for data mining of massive datasets with many variables, the demand for model selection techniques are becoming much stronger and needed. To this end, we introduce a new Implicit Enumeration (IE) algorithm and a hybridized IE with the Genetic Algorithm (GA) in this dissertation. The proposed Implicit Enumeration algorithm is the first algorithm that explicitly uses an information criterion as the objective function. The algorithm works with a variety of information criteria including some for which the existing branch and bound algorithms developed by Furnival and Wilson (1974) and Gatu and Kontoghiorghies (2003) are not applicable. It also finds the "best" subset model directly without the need of finding the "best" subset of each size as the branch and bound techniques do. The proposed methods are demonstrated in multiple, multivariate, logistic regression and discriminant analysis problems. The implicit enumeration algorithm converged to the optimal solution on real and simulated data sets with up to 80 predictors, thus having 280 = 1,208,925,819,614,630,000,000,000 possible subset models in the model portfolio. To our knowledge, none of the existing exact algorithms have the capability of optimally solving such problems of this size.

Scalable Subset Selection with Filters and Its Applications

Scalable Subset Selection with Filters and Its Applications PDF Author: Gregory Charles Ditzler
Publisher:
ISBN:
Category : Electrical engineering
Languages : en
Pages : 278

Book Description
Increasingly many applications of machine learning are encountering large data that were almost unimaginable just a few years ago, and hence, many of the current algorithms cannot handle, i.e., do not scale to, today's extremely large volumes of data. The data are made up of a large set of features describing each observation, and the complexity of the models for making predictions tend to increase not only with the number of observations, but also the number of features. Fortunately, not all of the features that make up the data carry meaningful information about making the predictions. Thus irrelevant features should be filtered from the data prior to building a model. Such a process of removing features to produce a subset is commonly referred to as feature subset selection. In this work, we present two new filter-based feature subset selection algorithms that are scalable to large data sets that address: (i) potentially large & distributed data sets, and (ii) they are capable of scaling to very large feature sets. Our first proposed algorithm, Neyman-Pearson Feature Selection (NPFS), uses a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any feature selection algorithm, regardless of the feature selection criteria used, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point, and it fits into a computationally attractive MapReduce model. We also describe a sequential learning framework for feature subset selection (SLSS) that scales with both the number of features as well as the number of observations. SLSS uses bandit algorithms to process features and form a level of importance for each feature. Feature selection is performed independently from the optimization of any classifier to reduce unnecessary complexity. We demonstrate the capabilities of NPFS and SLSS on synthetic and real-world data sets. We also present a new approach for classifier-dependent feature selection that is an online learning algorithm that easily handles large amounts of missing feature values in a data stream. There are many real-world applications that can benefit from scalable feature subset selection algorithms; one such area is the study of the microbiome (i.e., the study of micro-organisms and their influence on the environments that they inhabit). Feature subset selection algorithms can be used to sift through massive amounts of data collected from the genomic sciences to help microbial ecologists understand the microbes -- particularly the micro-organisms that are the best indicators by some phenotype, such as healthy or unhealthy. In this work, we provide insights into data collected from the American Gut Project, and deliver open-source software implementations for feature selection with biological data formats.

Algorithms for Solving Statistical Model Selection Problems

Algorithms for Solving Statistical Model Selection Problems PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Statistical model selection problems arises in diverse areas. Some of the selection methods have exponential complexities and thus, are computationally demanding. The purpose of this thesis is to propose computationally efficient and numerical reliable algorithms used in statistical model selection. Particular emphasis is given to the computationally intensive model selection strategies which evaluate regression trees and have combinatorial solutions. The computational efficiency of the proposed algorithms has been investigated by detailed complexity analysis. Parallel algorithms to compute all possible subset regression models are designed, implemented and analyzed. A branch-and-bound strategy that computes the best-subset regression models corresponding to each number of variables is proposed. A heuristic version of this strategy is developed. It is based on a tolerance parameter when deciding to cut a subtree. Experimental results which support the theoretical results of the new strategies are shown. The adaptation of the various regression tree strategies to subset Vector Autoregressive model selection problems is pursued. Various special cases for subset selection which exploit the common columns of the data matrices and the Kronecker structure of the variance-covariance matrix are investigated. Within this context, the design of a combinatorial algorithm to compute efficiently the estimators of a seemingly unrelated regressions model with permuted exogenous data matrices is designed. The algorithms developed in this thesis have as a main computational component the QR decomposition and its modification. Efficient strategies to compute the various matrix factorization problems which arise in the estimation procedures are designed. The non-dense structure of the matrices is exploited, Kronecker products are not explicitely computed and computation of matrix inverses is avoided.

Subset Selection in Regression

Subset Selection in Regression PDF Author: Alan J. Miller
Publisher: Chapman and Hall/CRC
ISBN:
Category : Computers
Languages : en
Pages : 248

Book Description
Most scientific computing packages contain facilities for stepwise regression and often for 'all subsets' and other techniques for finding 'best-fitting' subsets of regression variables. The application of standard theory can be very misleading in such cases when the model has not been chosen a priori, but from the data. There is widespread awareness that considerable over-fitting occurs and that prediction equations obtained after extensive 'data dredging' often perform poorly when applied to new data. This monograph relates almost entirely to least-squares methods of finding and fitting subsets of regression variables, though most of the concepts are presented in terms of the interpretation and statistical properties of orthogonal projections. An early chapter introduces these methods, which are still not widely known to users of least-squares methods. Existing methods are described for testing whether any useful improvement can be obtained by using any of a set of predictors. Spjotvoll's method for comparing two arbitrary subsets of predictor variables is illustrated and described in detail. When the selected model is the 'best-fitting' in some sense, conventional fitting methods give estimates of regression coefficients which are usually biased in the direction of being too large. The extent of this bias is demonstrated for simple cases. Various ad hoc methods for correcting the bias are discussed (ridge regression, James-Stein shrinkage, jack-knifing, etc.), together with the author's maximum likelihood technique. Areas in which further research is needed are also outlined.

Computational Methods of Feature Selection

Computational Methods of Feature Selection PDF Author: Huan Liu
Publisher: CRC Press
ISBN: 1584888792
Category : Business & Economics
Languages : en
Pages : 437

Book Description
Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the

16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021)

16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021) PDF Author: Hugo Sanjurjo González
Publisher: Springer Nature
ISBN: 3030878694
Category : Technology & Engineering
Languages : en
Pages : 840

Book Description
This book of Advances in Intelligent and Soft Computing contains accepted papers presented at SOCO 2021 conference held in the beautiful and historic city of Bilbao (Spain), in September 2021. Soft computing represents a collection or set of computational techniques in machine learning, computer science, and some engineering disciplines, which investigate, simulate, and analyze very complex issues and phenomena. After a through peer-review process, the 16th SOCO 2021 International Program Committee selected 78 papers which are published in these conference proceedings and represents an acceptance rate of 48%. In this relevant edition, a special emphasis is put on the organization of special sessions. Seven special sessions are organized related to relevant topics as follows: applications of machine learning in computer vision; soft computing applied to autonomous robots and renewable energy systems; optimization, modeling, and control by soft computing techniques (OMCS); challenges and new approaches toward artificial intelligence deployments in real-world scenarios; time series forecasting in industrial and environmental applications (TSF); soft computing methods in manufacturing and management systems and applied machine learning. The selection of papers was extremely rigorous in order to maintain the high quality of the conference, and we would like to thank the members of the program committees for their hard work in the reviewing process. This is a crucial process to the creation of a high standard conference, and the SOCO conference would not exist without their help.

Subset Selection Algorithms with Applications

Subset Selection Algorithms with Applications PDF Author: Shane Francis Cotter
Publisher:
ISBN:
Category :
Languages : en
Pages : 394

Book Description


Feature Engineering and Selection

Feature Engineering and Selection PDF Author: Max Kuhn
Publisher: CRC Press
ISBN: 1351609467
Category : Business & Economics
Languages : en
Pages : 266

Book Description
The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.

Computational Science and Its Applications – ICCSA 2022 Workshops

Computational Science and Its Applications – ICCSA 2022 Workshops PDF Author: Osvaldo Gervasi
Publisher: Springer Nature
ISBN: 3031105362
Category : Computers
Languages : en
Pages : 732

Book Description
The eight-volume set LNCS 13375 – 13382 constitutes the proceedings of the 22nd International Conference on Computational Science and Its Applications, ICCSA 2022, which was held in Malaga, Spain during July 4 – 7, 2022. The first two volumes contain the proceedings from ICCSA 2022, which are the 57 full and 24 short papers presented in these books were carefully reviewed and selected from 279 submissions. The other six volumes present the workshop proceedings, containing 285 papers out of 815 submissions. These six volumes includes the proceedings of the following workshops: ​ Advances in Artificial Intelligence Learning Technologies: Blended Learning, STEM, Computational Thinking and Coding (AAILT 2022); Workshop on Advancements in Applied Machine-learning and Data Analytics (AAMDA 2022); Advances in information Systems and Technologies for Emergency management, risk assessment and mitigation based on the Resilience (ASTER 2022); Advances in Web Based Learning (AWBL 2022); Blockchain and Distributed Ledgers: Technologies and Applications (BDLTA 2022); Bio and Neuro inspired Computing and Applications (BIONCA 2022); Configurational Analysis For Cities (CA Cities 2022); Computational and Applied Mathematics (CAM 2022), Computational and Applied Statistics (CAS 2022); Computational Mathematics, Statistics and Information Management (CMSIM); Computational Optimization and Applications (COA 2022); Computational Astrochemistry (CompAstro 2022); Computational methods for porous geomaterials (CompPor 2022); Computational Approaches for Smart, Conscious Cities (CASCC 2022); Cities, Technologies and Planning (CTP 2022); Digital Sustainability and Circular Economy (DiSCE 2022); Econometrics and Multidimensional Evaluation in Urban Environment (EMEUE 2022); Ethical AI applications for a human-centered cyber society (EthicAI 2022); Future Computing System Technologies and Applications (FiSTA 2022); Geographical Computing and Remote Sensing for Archaeology (GCRSArcheo 2022); Geodesign in Decision Making: meta planning and collaborative design for sustainable and inclusive development (GDM 2022); Geomatics in Agriculture and Forestry: new advances and perspectives (GeoForAgr 2022); Geographical Analysis, Urban Modeling, Spatial Statistics (Geog-An-Mod 2022); Geomatics for Resource Monitoring and Management (GRMM 2022); International Workshop on Information and Knowledge in the Internet of Things (IKIT 2022); 13th International Symposium on Software Quality (ISSQ 2022); Land Use monitoring for Sustanability (LUMS 2022); Machine Learning for Space and Earth Observation Data (MALSEOD 2022); Building multi-dimensional models for assessing complex environmental systems (MES 2022); MOdels and indicators for assessing and measuring the urban settlement deVElopment in the view of ZERO net land take by 2050 (MOVEto0 2022); Modelling Post-Covid cities (MPCC 2022); Ecosystem Services: nature’s contribution to people in practice. Assessment frameworks, models, mapping, and implications (NC2P 2022); New Mobility Choices For Sustainable and Alternative Scenarios (NEMOB 2022); 2nd Workshop on Privacy in the Cloud/Edge/IoT World (PCEIoT 2022); Psycho-Social Analysis of Sustainable Mobility in The Pre- and Post-Pandemic Phase (PSYCHE 2022); Processes, methods and tools towards RESilient cities and cultural heritage prone to SOD and ROD disasters (RES 2022); Scientific Computing Infrastructure (SCI 2022); Socio-Economic and Environmental Models for Land Use Management (SEMLUM 2022); 14th International Symposium on Software Engineering Processes and Applications (SEPA 2022); Ports of the future - smartness and sustainability (SmartPorts 2022); Smart Tourism (SmartTourism 2022); Sustainability Performance Assessment: models, approaches and applications toward interdisciplinary and integrated solutions (SPA 2022); Specifics of smart cities development in Europe (SPEED 2022); Smart and Sustainable Island Communities (SSIC 2022); Theoretical and Computational Chemistryand its Applications (TCCMA 2022); Transport Infrastructures for Smart Cities (TISC 2022); 14th International Workshop on Tools and Techniques in Software Development Process (TTSDP 2022); International Workshop on Urban Form Studies (UForm 2022); Urban Regeneration: Innovative Tools and Evaluation Model (URITEM 2022); International Workshop on Urban Space and Mobilities (USAM 2022); Virtual and Augmented Reality and Applications (VRA 2022); Advanced and Computational Methods for Earth Science Applications (WACM4ES 2022); Advanced Mathematics and Computing Methods in Complex Computational Systems (WAMCM 2022).

Natural and Artificial Computation in Engineering and Medical Applications

Natural and Artificial Computation in Engineering and Medical Applications PDF Author: Jose Manuel Ferrandez Vicente
Publisher: Springer
ISBN: 3642386229
Category : Computers
Languages : en
Pages : 497

Book Description
The two volume-set, LNCS 7930 and LNCS 7931, constitutes the refereed proceedings of the 5th International Work-Conference on the Interplay between Natural and Artificial Computation, IWINAC 2013, held in Mallorca, Spain, in June 2013. The 92 revised full papers presented in LNCS 7930 and LNCS 7931 were carefully reviewed and selected from numerous submissions. The first part, LNCS 7930, entitled "Natural and Artificial Models in Computation and Biology”, includes all the contributions mainly related to the methodological, conceptual, formal, and experimental developments in the fields of neurophysiology and cognitive science. The second part, LNCS 7931, entitled “Natural and Artificial Computation in Engineering and Medical Applications”, contains the papers related to bioinspired programming strategies and all the contributions related to the computational solutions to engineering problems in different application domains, specially Health applications, including the CYTED “Artificial and Natural Computation for Health” (CANS) research network papers. In addition, this two volume-set reflects six interesting areas: cognitive robotics; natural computing; wetware computation; quality of life technologies; biomedical and industrial perception applications; and Web intelligence and neuroscience.