A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses PDF full book. Access full book title A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses by Wenge Guo. Download full books in PDF and EPUB format.

A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses

A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses PDF Author: Wenge Guo
Publisher:
ISBN:
Category :
Languages : en
Pages : 37

Book Description


A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses

A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses PDF Author: Wenge Guo
Publisher:
ISBN:
Category :
Languages : en
Pages : 37

Book Description


NEW APPROACHES TO MULTIPLE TESTING OF GROUPED HYPOTHESES

NEW APPROACHES TO MULTIPLE TESTING OF GROUPED HYPOTHESES PDF Author: Yanping Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 97

Book Description
Testing multiple hypotheses appearing in non-overlapping groups is a common statistical problem in many modern scientific investigations, with this group formation occurring naturally in many of these investigations. The goal of this dissertation is to explore the current state of knowledge in the area of multiple testing of grouped hypotheses and to present newer and improved statistical methodologies. As the first part of this dissertation, we propose a new Bayesian two-stage multiple testing method controlling false discovery rate (FDR) across all hypotheses. The method decomposes a posterior measure of false discoveries across all hypotheses into within- and between-group components allowing a portion of the overall FDR level to be used to maintain control over within groupfalse discoveries. Such within-group FDR control effectively captures the group structure as well as the dependence, if any, within the groups. The procedure can maintain a tight control over the overall FDR,as shown numerically under two different model assumptions, independent and Markov dependent Bernoulli's, for the hidden states of the within-group hypotheses. The proposed method in its oracle form is optimal at both within-and between-group levels of its application. We also present a data driven version of the proposed method whose performance in terms of FDR control and power relative to its relevant competitors is examined through simulations. We apply this Bayesian method to a real data application, which is the Adequate Yearly Progress (AYP) study data of California elementary schools (2013) comparing the academic performance for socioeconomically advantaged (SEA) versus socioeconomically disadvantaged (SED) students, and our method has more meaningful discoveries than two other competing methods existing in the literature. The second part of the dissertation is geared towards making contribution to the outstanding problem of developing an FDR controlling frequentist method for multiple testing of grouped hypotheses, which can serve not only as an extension of the classical Benjamini -Hochberg (BH, 1995) method from single to multiple groups but also can be more powerful due to the underlying group structure. We suggest a number of such methods and examine their performances in comparison with the single-group BH method mainly based on simulations. Some possible future directions of research in the proposed area are discussed at the end of this dissertation.

The Control of the False Discovery Rate Under Structured Hypotheses

The Control of the False Discovery Rate Under Structured Hypotheses PDF Author: Gavin Lynch
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
The hypotheses in many multiple testing problems often have some inherent structure based on prior information such as Gene Ontology in gene expression data. However, few false discovery rate (FDR) controlling procedures take advantage of this inherent structure. In this dissertation, we develop FDR controlling methods which exploit the structural information of the hypotheses. \ First, we study the fixed sequence structure where the testing order of the hypotheses has been pre-specified. We are motivated to study this structure since it is the most basic of structures, yet, it has been largely ignored in the literature on large scale multiple testing. We first develop procedures using the conventional fixed sequence method, where the procedures stop testing after the first hypothesis is accepted. Then, we extend the method and develop procedures which stop after a pre- specified number of acceptances. A simulation study and real data analysis show that these procedures can be a powerful alternative to the standard Benj amini- Hochberg and Benjamini-Yekutieli procedures. Next, we consider the testing of hierarchically ordered hypotheses where hypotheses are arranged in a tree-like structure. First, we introduce a new multiple testing method called the generalized stepwise procedure and use it to create a general approach for testing hierarchically order hypotheses. Then, we develop several hierarchical testing procedures which control the FDR under various forms of dependence. Our simulation studies and real data analysis show that these proposed methods can be more powerful than alternative hierarchical testing methods, such as the method by Yekutieli (2008b). Finally, we focus on testing hypotheses along a directed acyclic graph (DAG). First, we introduce a novel approach to develop procedures for controlling error rates appropriate for large scale multiple testing. Then, we use this approach to develop an FDR controlling procedure which tests hypotheses along the DAG. To our knowledge, no other FDR controlling procedure exists to test hypotheses with this structure. The procedure is illustrated through a real microarray data analysis where Gene Ontology terms forming a DAG are tested for significance. In summary, this dissertation offers new FDR controlling methods which utilize the inherent structural information among the tested hypotheses.

Optimization, Learning, and Control for Interdependent Complex Networks

Optimization, Learning, and Control for Interdependent Complex Networks PDF Author: M. Hadi Amini
Publisher: Springer Nature
ISBN: 3030340945
Category : Technology & Engineering
Languages : en
Pages : 306

Book Description
This book focuses on a wide range of optimization, learning, and control algorithms for interdependent complex networks and their role in smart cities operation, smart energy systems, and intelligent transportation networks. It paves the way for researchers working on optimization, learning, and control spread over the fields of computer science, operation research, electrical engineering, civil engineering, and system engineering. This book also covers optimization algorithms for large-scale problems from theoretical foundations to real-world applications, learning-based methods to enable intelligence in smart cities, and control techniques to deal with the optimal and robust operation of complex systems. It further introduces novel algorithms for data analytics in large-scale interdependent complex networks. • Specifies the importance of efficient theoretical optimization and learning methods in dealing with emerging problems in the context of interdependent networks • Provides a comprehensive investigation of advance data analytics and machine learning algorithms for large-scale complex networks • Presents basics and mathematical foundations needed to enable efficient decision making and intelligence in interdependent complex networks M. Hadi Amini is an Assistant Professor at the School of Computing and Information Sciences at Florida International University (FIU). He is also the founding director of Sustainability, Optimization, and Learning for InterDependent networks laboratory (solid lab). He received his Ph.D. and M.Sc. from Carnegie Mellon University in 2019 and 2015 respectively. He also holds a doctoral degree in Computer Science and Technology. Prior to that, he received M.Sc. from Tarbiat Modares University in 2013, and the B.Sc. from Sharif University of Technology in 2011.

Improved Tools for Large-scale Hypothesis Testing

Improved Tools for Large-scale Hypothesis Testing PDF Author: Zihao Zheng
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
Large-scale hypothesis testing, as one of the key statistical tools, has been widely studied and applied to high throughput bioinformatics experiments, such as high-density peptide array studies and brain image data sets. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by this, in this dissertation, I develop several improved statistical and computational tools for large-scale hypothesis testing. The first method, MixTwice, advances an empirical-Bayesian tool that computes local false discovery rate statistics when provided with data on estimated effects and estimated standard errors. I also extend this method from two group comparison problems to multiple group comparison settings and develop a generalized method called MixTwice-ANOVA. The second method GraphicalT calculates local FDRs semiparametrically using available graph-associated information. The first method, MixTwice, introduces an empirical-Bayes approach that involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Provided with the estimated effect sizes and estimated errors, MixTwice estimates the mixing distribution and calculates the local false discovery rates via nonparametric MLE and constrained optimization with unimodal shape constraint of the effect distribution. Numerical experiments show that MixTwice can accurately estimate generative parameters and have good testing operating characteristics. Applied to a high-density peptide array, it powerfully identifies non-null peptides to recover meaningful peptide markers when the underlying signal is weak, and has strong reproducibility properties when the underlying signal is strong. The second contribution of this dissertation generalizes MixTwice from scenarios comparing two conditions to scenarios comparing multiple groups. Similar to MixTwice, MixTwice-ANOVA takes numerator and denominator statistics of F test to estimate two underlying mixing distributions. Compared with other large-scale testing tools for one-way ANOVA settings, MixTwice-ANOVA has better power properties and FDR control through numerical experiments. Applied to the peptide array study comparing multiple Sjogren-disease (SjD) populations, the proposed approach discovers meaningful epitope structure and novel scientific findings on Sjogren disease. Numerical experiments support evaluation among testing tools. Besides the methodology contribution of MixTwice in large-scale testing, I also discuss generalized evaluation and computational aspects. For the former part, I propose an evaluation metric, in additional to FDR control, power, etc., called reproducibility, to provide a practical guide for different testing tools. For the latter part, I borrow the idea from pool adjacent violator algorithm (PAVA) and advance a computational algorithm called EM-PAVA to solve nonparametric MLE with isotonic partial order constraint. This algorithm is discussed through theoretical guarantees and computational performances. The last contribution of this dissertation deals with large-scale testing problems with graph-associated data. Different from many studies that incorporate the graph-associated information through detailed modeling specifications, GraphicalT provides a semiparametric way to calculate the local false discovery rates using available auxiliary data graph. The method shows good performance in synthetic examples and in a brain-imaging problem from the study of Alzheimer's disease.

Some New Developments on Multiple Testing Procedures

Some New Developments on Multiple Testing Procedures PDF Author: Lilun Du
Publisher:
ISBN:
Category :
Languages : en
Pages : 134

Book Description
In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In chapter 2, we present a single-index modulated multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate p-value for each hypothesis. To find the optimal rejection region for the bivariate p-value, we propose a criteria based on the ratio of probability density functions of the bivariate p-value under the true null and non-null. This criteria in the bivariate normal setting further motivates us to project the bivariate p-value to a single index p-value, for a wide range of directions. The true null distribution of the single index p-value is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction, we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Multiple testing based on chi-squared test statistics is commonly used in many scientific fields such as genomics research and brain imaging studies. However, the challenges associated with designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. In chapter 3, we propose a Factor Connected procedure to fill in this gap. We first adopt a latent factor structure to construct a testing framework for approximating the false discovery proportion (FDP) for a large number of highly correlated chi-squared test statistics with finite degrees of freedom k. The testing framework is then connected to simultaneously testing k linear constraints in a large dimensional linear factor model involved with some observable and unobservable common factors, resulting in a consistent estimator of FDP based on the associated unadjusted p-values.

Multiple Hypothesis Testing

Multiple Hypothesis Testing PDF Author: Houston Nash Gilbert
Publisher:
ISBN:
Category :
Languages : en
Pages : 372

Book Description


Large-scale Multiple Hypothesis Testing with Complex Data Structure

Large-scale Multiple Hypothesis Testing with Complex Data Structure PDF Author: Xiaoyu Dai
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 104

Book Description
In the last decade, motivated by a variety of applications in medicine, bioinformatics, genomics, brain imaging, etc., a growing amount of statistical research has been devoted to large-scale multiple testing, where thousands or even greater numbers of tests are conducted simultaneously. However, due to the complexity of real data sets, the assumptions of many existing multiple testing procedures, e.g. that tests are independent and have continuous null distributions of p-values, may not hold. This poses limitations in their performances such as low detection power and inflated false discovery rate (FDR). In this dissertation, we study how to better proceed the multiple testing problems under complex data structures. In Chapter 2, we study the multiple testing with discrete test statistics. In Chapter 3, we study the discrete multiple testing with prior ordering information incorporated. In Chapter 4, we study the multiple testing under complex dependency structure. We propose novel procedures under each scenario, based on the marginal critical functions (MCFs) of randomized tests, the conditional random field (CRF) or the deep neural network (DNN). The theoretical properties of our procedures are carefully studied, and their performances are evaluated through various simulations and real applications with the analysis of genetic data from next-generation sequencing (NGS) experiments.

Multiple Testing and False Discovery Rate Control

Multiple Testing and False Discovery Rate Control PDF Author: Shiyun Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 142

Book Description
Multiple testing, a situation where multiple hypothesis tests are performed simultaneously, is a core research topic in statistics that arises in almost every scientific field. When more hypotheses are tested, more errors are bound to occur. Controlling the false discovery rate (FDR) [BH95], which is the expected proportion of falsely rejected null hypotheses among all rejections, is an important challenge for making meaningful inferences. Throughout the dissertation, we analyze the asymptotic performance of several FDR-controlling procedures under different multiple testing settings. In Chapter 1, we study the famous Benjamini-Hochberg (BH) method [BH95] which often serves as benchmark among FDR-controlling procedures, and show that it is asymptotic optimal in a stylized setting. We then prove that a distribution-free FDR control method of Barber and Candès [FBC15], which only requires the (unknown) null distribution to be symmetric, can achieve the same asymptotic performance as the BH method, thus is also optimal. Chapter 2 proposes an interval-type procedure which identifies the longest interval with the estimated FDR under a given level and rejects the corresponding hypotheses with P-values lying inside the interval. Unlike the threshold approaches, this procedure scans over all intervals with the left point not necessary being zero. We show that this scan procedure provides strong control of the asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate (FNR), deriving conditions under which it outperforms the BH procedure. In Chapter 3, we consider an online multiple testing problem where the hypotheses arrive sequentially in a stream, and investigate two procedures proposed by Javanmard and Montanari [JM15] which control FDR in an online manner. We quantify their asymptotic performance in the same location models as in Chapter 1 and compare their power with the (static) BH method. In Chapter 4, we propose a new class of powerful online testing procedures which incorporates the available contextual information, and prove that any rule in this class controls the online FDR under some standard assumptions. We also derive a practical algorithm that can make more empirical discoveries in an online fashion, compared to the state-of-the-art procedures.

Generalized Error Control in Multiple Hypothesis Testing

Generalized Error Control in Multiple Hypothesis Testing PDF Author: Wenge Guo
Publisher:
ISBN:
Category :
Languages : en
Pages : 143

Book Description
Multiple hypothesis testing is concerned with appropriately controlling the rate of false positives when testing a large number of hypotheses simultaneously, while maintaining the power of each test as much as possible. For testing multiple null hypotheses, the classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. However, quite often, especially when a large number of hypotheses are simultaneously tested, the notion of FWER turns out to be too stringent, allowing little chance to detect many false null hypotheses. Therefore, researchers have focused in the last decade on defining alternative less stringent error rates and developing methods that control them. The false discovery rate (FDR), the expected proportion of falsely rejected null hypotheses, due to Benjamini and Hochberg (1995), is the first of these alternative error rates that has received considerable attention. Recently, the ideas of controlling the probabilities of falsely rejecting at least k null hypotheses, which is the k-FWER, and the false discovery proportion (FDP) exceeding a certain threshold y have been introduced as alternatives to the FWER and methods controlling these new error rates have been suggested. Very recently, following the idea similar to that of the k-FWER, Sarkar (2006) generalized the FDR to the k-FDR, the expected ratio of k or more false rejections to the total number of rejections, which is a less conservative notion of error rate than the FDR and k-FWER. In this work, we develop multiple testing theory and methods for controlling the new type I error rates. Specifically, it consists of four parts: (1) We develop a new stepdown FDR controlling procedure under no assumption on dependency of the underlying p-values, which has much smaller critical constants than that of the existing Benjamini-Yekutieli stepup procedure; (2) We develop new k-FWER and FDP stepdown procedures under the assumption of independence, which are much more powerful than the existing k-FWER and FDP procedures and show that under certain condition, the k-FWER stepdown procedure is unimprovable; (3) We offer a unified approach for construction of k-FWER controlling procedures by generalizing the closure principle in the context of the FWER to the case of the k-FWER; (4) We develop new Benjamini-Hochberg type k-FDR stepup and stepdown procedures in different settings and apply them to one real microarray data analysis.