Large-scale Multiple Hypothesis Testing with Complex Data Structure PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Large-scale Multiple Hypothesis Testing with Complex Data Structure PDF full book. Access full book title Large-scale Multiple Hypothesis Testing with Complex Data Structure by Xiaoyu Dai. Download full books in PDF and EPUB format.

Large-scale Multiple Hypothesis Testing with Complex Data Structure

Author: Xiaoyu Dai
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 104

Book Description
In the last decade, motivated by a variety of applications in medicine, bioinformatics, genomics, brain imaging, etc., a growing amount of statistical research has been devoted to large-scale multiple testing, where thousands or even greater numbers of tests are conducted simultaneously. However, due to the complexity of real data sets, the assumptions of many existing multiple testing procedures, e.g. that tests are independent and have continuous null distributions of p-values, may not hold. This poses limitations in their performances such as low detection power and inflated false discovery rate (FDR). In this dissertation, we study how to better proceed the multiple testing problems under complex data structures. In Chapter 2, we study the multiple testing with discrete test statistics. In Chapter 3, we study the discrete multiple testing with prior ordering information incorporated. In Chapter 4, we study the multiple testing under complex dependency structure. We propose novel procedures under each scenario, based on the marginal critical functions (MCFs) of randomized tests, the conditional random field (CRF) or the deep neural network (DNN). The theoretical properties of our procedures are carefully studied, and their performances are evaluated through various simulations and real applications with the analysis of genetic data from next-generation sequencing (NGS) experiments.

Large-scale Multiple Hypothesis Testing with Complex Data Structure

Author: Xiaoyu Dai
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 104

Analysis of Error Control in Large Scale Two-stage Multiple Hypothesis Testing

Author: Wenge Guo
Publisher:
ISBN:
Category :
Languages : en
Pages : 50

Book Description

Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII

Author: Abdelkader Hameurlain
Publisher: Springer Nature
ISBN: 3662629194
Category : Computers
Languages : en
Pages : 247

Book Description
The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. This, the 47th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, constitutes a special issue focusing on Digital Ecosystems and Social Networks. The 9 revised selected papers cover topics that include Social Big Data, Data Analysis, Cloud-Based Feedback, Experience Ecosystems, Pervasive Environments, and Smart Systems.

Methods in Multiple Testing and Meta-analysis with Applications to the Analysis of Genomic Data

Author: Yihan Li
Publisher:
ISBN:
Category :
Languages : en
Pages : 160

Book Description

Model-Based Hypothesis Testing in Biomedicine

Author: Rikard Johansson
Publisher: Linköping University Electronic Press
ISBN: 9176854574
Category :
Languages : en
Pages : 102

Book Description
The utilization of mathematical tools within biology and medicine has traditionally been less widespread compared to other hard sciences, such as physics and chemistry. However, an increased need for tools such as data processing, bioinformatics, statistics, and mathematical modeling, have emerged due to advancements during the last decades. These advancements are partly due to the development of high-throughput experimental procedures and techniques, which produce ever increasing amounts of data. For all aspects of biology and medicine, these data reveal a high level of inter-connectivity between components, which operate on many levels of control, and with multiple feedbacks both between and within each level of control. However, the availability of these large-scale data is not synonymous to a detailed mechanistic understanding of the underlying system. Rather, a mechanistic understanding is gained first when we construct a hypothesis, and test its predictions experimentally. Identifying interesting predictions that are quantitative in nature, generally requires mathematical modeling. This, in turn, requires that the studied system can be formulated into a mathematical model, such as a series of ordinary differential equations, where different hypotheses can be expressed as precise mathematical expressions that influence the output of the model. Within specific sub-domains of biology, the utilization of mathematical models have had a long tradition, such as the modeling done on electrophysiology by Hodgkin and Huxley in the 1950s. However, it is only in recent years, with the arrival of the field known as systems biology that mathematical modeling has become more commonplace. The somewhat slow adaptation of mathematical modeling in biology is partly due to historical differences in training and terminology, as well as in a lack of awareness of showcases illustrating how modeling can make a difference, or even be required, for a correct analysis of the experimental data. In this work, I provide such showcases by demonstrating the universality and applicability of mathematical modeling and hypothesis testing in three disparate biological systems. In Paper II, we demonstrate how mathematical modeling is necessary for the correct interpretation and analysis of dominant negative inhibition data in insulin signaling in primary human adipocytes. In Paper III, we use modeling to determine transport rates across the nuclear membrane in yeast cells, and we show how this technique is superior to traditional curve-fitting methods. We also demonstrate the issue of population heterogeneity and the need to account for individual differences between cells and the population at large. In Paper IV, we use mathematical modeling to reject three hypotheses concerning the phenomenon of facilitation in pyramidal nerve cells in rats and mice. We also show how one surviving hypothesis can explain all data and adequately describe independent validation data. Finally, in Paper I, we develop a method for model selection and discrimination using parametric bootstrapping and the combination of several different empirical distributions of traditional statistical tests. We show how the empirical log-likelihood ratio test is the best combination of two tests and how this can be used, not only for model selection, but also for model discrimination. In conclusion, mathematical modeling is a valuable tool for analyzing data and testing biological hypotheses, regardless of the underlying biological system. Further development of modeling methods and applications are therefore important since these will in all likelihood play a crucial role in all future aspects of biology and medicine, especially in dealing with the burden of increasing amounts of data that is made available with new experimental techniques. Användandet av matematiska verktyg har inom biologi och medicin traditionellt sett varit mindre utbredd jämfört med andra ämnen inom naturvetenskapen, såsom fysik och kemi. Ett ökat behov av verktyg som databehandling, bioinformatik, statistik och matematisk modellering har trätt fram tack vare framsteg under de senaste decennierna. Dessa framsteg är delvis ett resultat av utvecklingen av storskaliga datainsamlingstekniker. Inom alla områden av biologi och medicin så har dessa data avslöjat en hög nivå av interkonnektivitet mellan komponenter, verksamma på många kontrollnivåer och med flera återkopplingar både mellan och inom varje nivå av kontroll. Tillgång till storskaliga data är emellertid inte synonymt med en detaljerad mekanistisk förståelse för det underliggande systemet. Snarare uppnås en mekanisk förståelse först när vi bygger en hypotes vars prediktioner vi kan testa experimentellt. Att identifiera intressanta prediktioner som är av kvantitativ natur, kräver generellt sett matematisk modellering. Detta kräver i sin tur att det studerade systemet kan formuleras till en matematisk modell, såsom en serie ordinära differentialekvationer, där olika hypoteser kan uttryckas som precisa matematiska uttryck som påverkar modellens output. Inom vissa delområden av biologin har utnyttjandet av matematiska modeller haft en lång tradition, såsom den modellering gjord inom elektrofysiologi av Hodgkin och Huxley på 1950?talet. Det är emellertid just på senare år, med ankomsten av fältet systembiologi, som matematisk modellering har blivit ett vanligt inslag. Den något långsamma adapteringen av matematisk modellering inom biologi är bl.a. grundad i historiska skillnader i träning och terminologi, samt brist på medvetenhet om exempel som illustrerar hur modellering kan göra skillnad och faktiskt ofta är ett krav för en korrekt analys av experimentella data. I detta arbete tillhandahåller jag sådana exempel och demonstrerar den matematiska modelleringens och hypotestestningens allmängiltighet och tillämpbarhet i tre olika biologiska system. I Arbete II visar vi hur matematisk modellering är nödvändig för en korrekt tolkning och analys av dominant-negativ-inhiberingsdata vid insulinsignalering i primära humana adipocyter. I Arbete III använder vi modellering för att bestämma transporthastigheter över cellkärnmembranet i jästceller, och vi visar hur denna teknik är överlägsen traditionella kurvpassningsmetoder. Vi demonstrerar också frågan om populationsheterogenitet och behovet av att ta hänsyn till individuella skillnader mellan celler och befolkningen som helhet. I Arbete IV använder vi matematisk modellering för att förkasta tre hypoteser om hur fenomenet facilitering uppstår i pyramidala nervceller hos råttor och möss. Vi visar också hur en överlevande hypotes kan beskriva all data, inklusive oberoende valideringsdata. Slutligen utvecklar vi i Arbete I en metod för modellselektion och modelldiskriminering med hjälp av parametrisk ”bootstrapping” samt kombinationen av olika empiriska fördelningar av traditionella statistiska tester. Vi visar hur det empiriska ”log-likelihood-ratio-testet” är den bästa kombinationen av två tester och hur testet är applicerbart, inte bara för modellselektion, utan också för modelldiskriminering. Sammanfattningsvis är matematisk modellering ett värdefullt verktyg för att analysera data och testa biologiska hypoteser, oavsett underliggande biologiskt system. Vidare utveckling av modelleringsmetoder och tillämpningar är därför viktigt eftersom dessa sannolikt kommer att spela en avgörande roll i framtiden för biologi och medicin, särskilt när det gäller att hantera belastningen från ökande datamängder som blir tillgänglig med nya experimentella tekniker.

LIST DATA STRUCTURE: THEORY AND APPLICATIONS WITH PYTHON AND TKINTER

Author: Vivian Siahaan
Publisher: BALIGE PUBLISHING
ISBN:
Category : Computers
Languages : en
Pages : 316

Book Description
In the rapidly evolving world of technology, understanding foundational concepts like data structures, specifically lists, and their manipulation is essential. This book aims to delve deep into the practicalities of using lists in Python, a versatile and widely-used programming language known for its ease of use and powerful libraries. Coupled with this, the book explores the graphical user interface library, Tkinter, providing a comprehensive guide on how to make Python's capabilities more interactive and user-friendly. The significance of lists in programming cannot be overstated. They are among the most basic and crucial data structures in computer science, essential for storing sequences of data that are dynamically modifiable. In Python, lists are used extensively across simple applications to high-end data processing tasks. This book will start by exploring the anatomy of lists in Python, covering their creation, manipulation, and application in various real-world scenarios. Following the understanding of lists, the discussion will transition to operations on lists. Operations like appending, slicing, sorting, and more are pivotal in handling data efficiently. Through practical examples and detailed explanation, readers will learn how these operations are implemented in Python and how they can be used to solve common programming problems. Moreover, the power of list comprehensions, a distinctive feature of Python that allows for concise and efficient manipulation of lists, will be thoroughly discussed. This feature not only simplifies code but also enhances its readability and efficiency, making Python an appealing choice for developers. However, theoretical knowledge of these operations and their syntax only scratches the surface of their potential. To bridge the gap between theory and practical application, this book incorporates interactive examples using Tkinter, Python’s standard GUI library. Tkinter allows programmers to create graphical interfaces, making software applications accessible to a broader audience, including those who might not be comfortable with command-line interfaces. Integrating list operations into a GUI can significantly enhance the functionality and user-friendliness of applications. For instance, users can interact with the data more intuitively, perform operations in real-time, and see the results immediately, which is crucial for learning and debugging. The chapters dedicated to Tkinter will guide readers through setting up their first GUI applications. Starting from basic windows and widgets, the discussion will evolve to include how list operations can be integrated into these interfaces. Whether it's displaying a list, updating it based on user input, or sorting and filtering data based on user commands, the book will cover a wide range of use cases. One of the core strengths of combining list operations with Tkinter is in educational software, where interactive tools can significantly enhance the learning experience. By allowing students to manipulate data structures in real-time, they can see the immediate impact of their actions, thereby deepening their understanding of the subject matter. Furthermore, this approach has applications in professional software development, where developers need to build applications that are not only functional but also intuitive and responsive. The book will explore several project ideas and real-world applications, showing how the concepts discussed can be used to build meaningful and efficient software. Beyond educational and professional environments, this integration finds relevance in data analysis and visualization tasks. Analysts often need to manipulate large datasets and visualize their results effectively. Here, Python’s list operations and Tkinter’s graphical capabilities come together to offer powerful tools for data manipulation and display. In addition to practical applications, the book also addresses best practices and common pitfalls in both list manipulation and GUI development. Understanding these will help readers avoid common errors and improve the performance of their code. As technology continues to advance, the importance of understanding foundational programming skills and integrating them into user-friendly applications cannot be overstated. This book is designed not just to teach but also to inspire its readers to explore the possibilities of Python and Tkinter, encouraging them to develop applications that are powerful, efficient, and user-centric. In conclusion, this book serves as a comprehensive guide for anyone looking to deepen their understanding of Python’s list operations and GUI development using Tkinter. By the end of this book, readers will not only be proficient in these areas but will also be equipped to apply these skills in practical, innovative, and effective ways..

Multiple Testing Procedures with Applications to Genomics

Author: Sandrine Dudoit
Publisher: Springer Science & Business Media
ISBN: 0387493174
Category : Science
Languages : en
Pages : 611

Book Description
This book establishes the theoretical foundations of a general methodology for multiple hypothesis testing and discusses its software implementation in R and SAS. These are applied to a range of problems in biomedical and genomic research, including identification of differentially expressed and co-expressed genes in high-throughput gene expression experiments; tests of association between gene expression measures and biological annotation metadata; sequence analysis; and genetic mapping of complex traits using single nucleotide polymorphisms. The procedures are based on a test statistics joint null distribution and provide Type I error control in testing problems involving general data generating distributions, null hypotheses, and test statistics.

Modelling of Pollutants in Complex Environmental Systems

Author: Grady Hanrahan
Publisher: ILM Publications
ISBN: 1906799008
Category : Nature
Languages : en
Pages : 350

Book Description
This title showcases modern environmental modelling methods, the basic theory behind them and their incorporation into complex environmental investigations.

Resampling-Based Multiple Testing

Author: Peter H. Westfall
Publisher: John Wiley & Sons
ISBN: 9780471557616
Category : Mathematics
Languages : en
Pages : 382

Book Description
Combines recent developments in resampling technology (including the bootstrap) with new methods for multiple testing that are easy to use, convenient to report and widely applicable. Software from SAS Institute is available to execute many of the methods and programming is straightforward for other applications. Explains how to summarize results using adjusted p-values which do not necessitate cumbersome table look-ups. Demonstrates how to incorporate logical constraints among hypotheses, further improving power.

Assessing Rare Variation in Complex Traits

Author: Eleftheria Zeggini
Publisher: Springer
ISBN: 1493928244
Category : Medical
Languages : en
Pages : 262

Book Description
This book is unique in covering a wide range of design and analysis issues in genetic studies of rare variants, taking advantage of collaboration of the editors with many experts in the field through large-scale international consortia including the UK10K Project, GO-T2D and T2D-GENES. Chapters provide details of state-of-the-art methodology for rare variant detection and calling, imputation and analysis in samples of unrelated individuals and families. The book also covers analytical issues associated with the study of rare variants, such as the impact of fine-scale population structure, and with combining information on rare variants across studies in a meta-analysis framework. Genetic association studies have in the last few years substantially enhanced our understanding of factors underlying traits of high medical importance, such as body mass index, lipid levels, blood pressure and many others. There is growing empirical evidence that low-frequency and rare variants play an important role in complex human phenotypes. This book covers multiple aspects of study design, analysis and interpretation for complex trait studies focusing on rare sequence variation. In many areas of genomic research, including complex trait association studies, technology is in danger of outstripping our capacity to analyse and interpret the vast amounts of data generated. The field of statistical genetics in the whole-genome sequencing era is still in its infancy, but powerful methods to analyse the aggregation of low-frequency and rare variants are now starting to emerge. The chapter Functional Annotation of Rare Genetic Variants is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.