Two Graph-based Tests for High-dimensional Inference PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Two Graph-based Tests for High-dimensional Inference PDF full book. Access full book title Two Graph-based Tests for High-dimensional Inference by Hao Chen. Download full books in PDF and EPUB format.

Two Graph-based Tests for High-dimensional Inference

Two Graph-based Tests for High-dimensional Inference PDF Author: Hao Chen
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
With modern science there is a growing emphasis on multivariate, complex data types. Some of these data are high dimensional. Others, such as survey preference, network, and tree data, cannot be characterized easily with standard models on Euclidean spaces. This dissertation details the investigation in this new setting of two classic statistical problems: change-point detection and two-sample comparison of categorical data. Change-point models are widely used in various fields for detecting lack of homogeneity in a sequence of observations. In many applications, the dimension of the observations in the sequence can be very high, even much larger than the length of the sequence. Testing the homogeneity of such sequences is a challenging but important problem. Existing approaches are limited in many ways. We proposed a new non-parametric approach that can be applied to data in high dimension, and even to non-Euclidean object data, as long as an informative similarity measure on the sample space can be defined. The approach is graph-based two-sample tests adapted to the scan-statistic setting. Graph-based two-sample tests are tests base on graphs connecting observations by similarity [Friedman and Rafsky, 1979, Rosenbaum, 2005]. We show that this new approach is powerful in high dimensions compared to parametric approaches. We also derive accurate analytic $p$-value approximations for very general situations, which lead to easy off-the-shelf homogeneity testing for large multivariate data sets. This approach has been applied on two data sets: The determination of authorship of a classic novel, and the detection of change in a social network over time. Two-sample comparison of categorical data is a classic problem in statistics. In many modern applications, the number of categories can be quite large, even comparable to the sample size, causing existing methods to have low power. When the number of categories is large, there is often underlying structure on the sample space that can be exploited. We propose a general non-parametric approach that makes use of similarity information on the space of categories in two-sample tests. Our approach addresses a shortcoming of existing graph-based two-sample tests by no longer requiring uniqueness of the underlying graph, thus allowing ties in the distance matrix defining the graph. We found two types of statistics that are both powerful and fast to compute. We show that their permutation null distributions are asymptotically normal and that their $p$-value approximations under typical settings are quite accurate, facilitating the application of this approach.

Two Graph-based Tests for High-dimensional Inference

Two Graph-based Tests for High-dimensional Inference PDF Author: Hao Chen
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
With modern science there is a growing emphasis on multivariate, complex data types. Some of these data are high dimensional. Others, such as survey preference, network, and tree data, cannot be characterized easily with standard models on Euclidean spaces. This dissertation details the investigation in this new setting of two classic statistical problems: change-point detection and two-sample comparison of categorical data. Change-point models are widely used in various fields for detecting lack of homogeneity in a sequence of observations. In many applications, the dimension of the observations in the sequence can be very high, even much larger than the length of the sequence. Testing the homogeneity of such sequences is a challenging but important problem. Existing approaches are limited in many ways. We proposed a new non-parametric approach that can be applied to data in high dimension, and even to non-Euclidean object data, as long as an informative similarity measure on the sample space can be defined. The approach is graph-based two-sample tests adapted to the scan-statistic setting. Graph-based two-sample tests are tests base on graphs connecting observations by similarity [Friedman and Rafsky, 1979, Rosenbaum, 2005]. We show that this new approach is powerful in high dimensions compared to parametric approaches. We also derive accurate analytic $p$-value approximations for very general situations, which lead to easy off-the-shelf homogeneity testing for large multivariate data sets. This approach has been applied on two data sets: The determination of authorship of a classic novel, and the detection of change in a social network over time. Two-sample comparison of categorical data is a classic problem in statistics. In many modern applications, the number of categories can be quite large, even comparable to the sample size, causing existing methods to have low power. When the number of categories is large, there is often underlying structure on the sample space that can be exploited. We propose a general non-parametric approach that makes use of similarity information on the space of categories in two-sample tests. Our approach addresses a shortcoming of existing graph-based two-sample tests by no longer requiring uniqueness of the underlying graph, thus allowing ties in the distance matrix defining the graph. We found two types of statistics that are both powerful and fast to compute. We show that their permutation null distributions are asymptotically normal and that their $p$-value approximations under typical settings are quite accurate, facilitating the application of this approach.

Introduction to High-Dimensional Statistics

Introduction to High-Dimensional Statistics PDF Author: Christophe Giraud
Publisher: CRC Press
ISBN: 1000408329
Category : Computers
Languages : en
Pages : 364

Book Description
Praise for the first edition: "[This book] succeeds singularly at providing a structured introduction to this active field of research. ... it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. ... recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research." —Journal of the American Statistical Association Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition: Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators. Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds. Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality. Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory. Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site. Illustrates concepts with simple but clear practical examples.

Sparse Graphical Modeling for High Dimensional Data

Sparse Graphical Modeling for High Dimensional Data PDF Author: Faming Liang
Publisher: CRC Press
ISBN: 0429584806
Category : Mathematics
Languages : en
Pages : 151

Book Description
A general framework for learning sparse graphical models with conditional independence tests Complete treatments for different types of data, Gaussian, Poisson, multinomial, and mixed data Unified treatments for data integration, network comparison, and covariate adjustment Unified treatments for missing data and heterogeneous data Efficient methods for joint estimation of multiple graphical models Effective methods of high-dimensional variable selection Effective methods of high-dimensional inference

Statistical Inference from High Dimensional Data

Statistical Inference from High Dimensional Data PDF Author: Carlos Fernandez-Lozano
Publisher: MDPI
ISBN: 3036509445
Category : Science
Languages : en
Pages : 314

Book Description
• Real-world problems can be high-dimensional, complex, and noisy • More data does not imply more information • Different approaches deal with the so-called curse of dimensionality to reduce irrelevant information • A process with multidimensional information is not necessarily easy to interpret nor process • In some real-world applications, the number of elements of a class is clearly lower than the other. The models tend to assume that the importance of the analysis belongs to the majority class and this is not usually the truth • The analysis of complex diseases such as cancer are focused on more-than-one dimensional omic data • The increasing amount of data thanks to the reduction of cost of the high-throughput experiments opens up a new era for integrative data-driven approaches • Entropy-based approaches are of interest to reduce the dimensionality of high-dimensional data

Large-scale Statistical Inference for Graph-associated Data

Large-scale Statistical Inference for Graph-associated Data PDF Author: Tien Vo
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
Large-scale hypothesis testing is very important for assessing population differences from sampled data in various application domains. In many cases, high-dimensional data are naturally associated with a graphical architecture, in which measured variables reside on graph vertices and the connectivity of the graph conveys information about the underlying relational structure among the data. Essentially, each edge in the graph represents the relationship between values at its endpoints due to some conceptual dependency, e.g temporal, spatial, functional, anatomical, etc. Available large-scale testing methods often consider dependencies a nuisance, and, by using sufficiently simple, unit-level test statistics, aim to control false discovery rate in a way that is robust to details of such dependence. Where some available methods do incorporate models of dependence, they are limited in scope and they do not take advantage of the graphical nature of the data structure. Given shortcomings of available methods and the importance of the largescale testing problem, we propose a new methodology to incorporate graphical information for hypothesis testing. Our proposed method, graph-based mixture model (GraphMM) is a semiparametric empirical Bayesian approach, motivated from a hybrid procedure that exploits grouping information of model parameters to increase testing sensitivity. We conduct experiments on a parallel computing platform and apply model in the context of a neuroimaging task to detect subtle changes from magnetic resonance imagery.

Change-point Problems

Change-point Problems PDF Author: Edward G. Carlstein
Publisher: IMS
ISBN: 9780940600348
Category : Mathematics
Languages : en
Pages : 400

Book Description


Concentration of Maxima and Fundamental Limits in High-Dimensional Testing and Inference

Concentration of Maxima and Fundamental Limits in High-Dimensional Testing and Inference PDF Author: Zheng Gao
Publisher: Springer Nature
ISBN: 3030809641
Category : Mathematics
Languages : en
Pages : 147

Book Description
This book provides a unified exposition of some fundamental theoretical problems in high-dimensional statistics. It specifically considers the canonical problems of detection and support estimation for sparse signals observed with noise. Novel phase-transition results are obtained for the signal support estimation problem under a variety of statistical risks. Based on a surprising connection to a concentration of maxima probabilistic phenomenon, the authors obtain a complete characterization of the exact support recovery problem for thresholding estimators under dependent errors.

Object Oriented Data Analysis

Object Oriented Data Analysis PDF Author: J. S. Marron
Publisher: CRC Press
ISBN: 1351189662
Category : Computers
Languages : en
Pages : 436

Book Description
Object Oriented Data Analysis is a framework that facilitates inter-disciplinary research through new terminology for discussing the often many possible approaches to the analysis of complex data. Such data are naturally arising in a wide variety of areas. This book aims to provide ways of thinking that enable the making of sensible choices. The main points are illustrated with many real data examples, based on the authors' personal experiences, which have motivated the invention of a wide array of analytic methods. While the mathematics go far beyond the usual in statistics (including differential geometry and even topology), the book is aimed at accessibility by graduate students. There is deliberate focus on ideas over mathematical formulas. J. S. Marron is the Amos Hawley Distinguished Professor of Statistics, Professor of Biostatistics, Adjunct Professor of Computer Science, Faculty Member of the Bioinformatics and Computational Biology Curriculum and Research Member of the Lineberger Cancer Center and the Computational Medicine Program, at the University of North Carolina, Chapel Hill. Ian L. Dryden is a Professor in the Department of Mathematics and Statistics at Florida International University in Miami, has served as Head of School of Mathematical Sciences at the University of Nottingham, and is joint author of the acclaimed book Statistical Shape Analysis.

Sparse Graphical Modeling for High Dimensional Data

Sparse Graphical Modeling for High Dimensional Data PDF Author: Faming Liang
Publisher: CRC Press
ISBN: 0429582900
Category : Mathematics
Languages : en
Pages : 150

Book Description
This book provides a general framework for learning sparse graphical models with conditional independence tests. It includes complete treatments for Gaussian, Poisson, multinomial, and mixed data; unified treatments for covariate adjustments, data integration, and network comparison; unified treatments for missing data and heterogeneous data; efficient methods for joint estimation of multiple graphical models; effective methods of high-dimensional variable selection; and effective methods of high-dimensional inference. The methods possess an embarrassingly parallel structure in performing conditional independence tests, and the computation can be significantly accelerated by running in parallel on a multi-core computer or a parallel architecture. This book is intended to serve researchers and scientists interested in high-dimensional statistics, and graduate students in broad data science disciplines. Key Features: A general framework for learning sparse graphical models with conditional independence tests Complete treatments for different types of data, Gaussian, Poisson, multinomial, and mixed data Unified treatments for data integration, network comparison, and covariate adjustment Unified treatments for missing data and heterogeneous data Efficient methods for joint estimation of multiple graphical models Effective methods of high-dimensional variable selection Effective methods of high-dimensional inference

Multiple Testing Procedures with Applications to Genomics

Multiple Testing Procedures with Applications to Genomics PDF Author: Sandrine Dudoit
Publisher: Springer Science & Business Media
ISBN: 0387493174
Category : Science
Languages : en
Pages : 611

Book Description
This book establishes the theoretical foundations of a general methodology for multiple hypothesis testing and discusses its software implementation in R and SAS. These are applied to a range of problems in biomedical and genomic research, including identification of differentially expressed and co-expressed genes in high-throughput gene expression experiments; tests of association between gene expression measures and biological annotation metadata; sequence analysis; and genetic mapping of complex traits using single nucleotide polymorphisms. The procedures are based on a test statistics joint null distribution and provide Type I error control in testing problems involving general data generating distributions, null hypotheses, and test statistics.