Estimation and Selection in High-Dimensional Genomic Studies PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Estimation and Selection in High-Dimensional Genomic Studies PDF full book. Access full book title Estimation and Selection in High-Dimensional Genomic Studies by Hisashi Noma. Download full books in PDF and EPUB format.

Estimation and Selection in High-Dimensional Genomic Studies

Estimation and Selection in High-Dimensional Genomic Studies PDF Author: Hisashi Noma
Publisher: Springer
ISBN: 9784431555667
Category : Medical
Languages : en
Pages : 90

Book Description
This book provides an overview of the statistical methods used in genome-wide screening of relevant genomic features or genes. Gene screening can facilitate deeper understanding of disease biology at the molecular level, possibly leading to discovery of new molecular targets for developing new treatments and developing diagnostic tests to predict patients’ prognosis or response to treatment. The most common approach to such gene screening studies is to apply multiple univariate analysis based on separate statistical tests for individual genes to test the null hypothesis of no association with clinical variables. This book first provides an overview of the state of the art of such multiple testing methodologies for gene screening, including frequentist multiple tests, empirical Bayes, and full-Bayes model-based methods for controlling the family-wise error rate or false discovery rate. Optimal discovery procedures and model-based variants are also discussed. Although great endeavor has been directed toward developing multiple testing methods, there are other, more relevant and effective analyses that should be given much attention in gene screening, including gene ranking, estimation of effect sizes, and classification accuracy based on selected genes. The core contents of this book provide a framework for integrated gene screening analysis based on hierarchical mixture modeling and empirical Bayes. Within this framework effective tools for multiple testing, ranking, estimation of effect size, and classification accuracy are derived. Methods for sample size determination for gene screening studies are also provided. With this content, the book is certain to expand the existing framework of statistical analysis based on multiple testing for gene screening to one based on estimation and selection.

Estimation and Selection in High-Dimensional Genomic Studies

Estimation and Selection in High-Dimensional Genomic Studies PDF Author: Hisashi Noma
Publisher: Springer
ISBN: 9784431555667
Category : Medical
Languages : en
Pages : 90

Book Description
This book provides an overview of the statistical methods used in genome-wide screening of relevant genomic features or genes. Gene screening can facilitate deeper understanding of disease biology at the molecular level, possibly leading to discovery of new molecular targets for developing new treatments and developing diagnostic tests to predict patients’ prognosis or response to treatment. The most common approach to such gene screening studies is to apply multiple univariate analysis based on separate statistical tests for individual genes to test the null hypothesis of no association with clinical variables. This book first provides an overview of the state of the art of such multiple testing methodologies for gene screening, including frequentist multiple tests, empirical Bayes, and full-Bayes model-based methods for controlling the family-wise error rate or false discovery rate. Optimal discovery procedures and model-based variants are also discussed. Although great endeavor has been directed toward developing multiple testing methods, there are other, more relevant and effective analyses that should be given much attention in gene screening, including gene ranking, estimation of effect sizes, and classification accuracy based on selected genes. The core contents of this book provide a framework for integrated gene screening analysis based on hierarchical mixture modeling and empirical Bayes. Within this framework effective tools for multiple testing, ranking, estimation of effect size, and classification accuracy are derived. Methods for sample size determination for gene screening studies are also provided. With this content, the book is certain to expand the existing framework of statistical analysis based on multiple testing for gene screening to one based on estimation and selection.

High-Dimensional Data Analysis in Cancer Research

High-Dimensional Data Analysis in Cancer Research PDF Author: Xiaochun Li
Publisher: Springer Science & Business Media
ISBN: 0387697659
Category : Medical
Languages : en
Pages : 164

Book Description
Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.

High-dimensional Variable Selection for Genomics Data, from Both Frequentist and Bayesian Perspectives

High-dimensional Variable Selection for Genomics Data, from Both Frequentist and Bayesian Perspectives PDF Author: Jie Ren
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Variable selection is one of the most popular tools for analyzing high-dimensional genomic data. It has been developed to accommodate complex data structures and lead to structured sparse identification of important genomics features. We focus on the network and interaction structure that commonly exist in genomic data, and develop novel variable selection methods from both frequentist and Bayesian perspectives. Network-based regularization has achieved success in variable selections for high-dimensional cancer genomic data, due to its ability to incorporate the correlations among genomic features. However, as survival time data usually follow skewed distributions, and are contaminated by outliers, network-constrained regularization that does not take the robustness into account leads to false identifications of network structure and biased estimation of patients' survival. In the first project, we develop a novel robust network-based variable selection method under the accelerated failure time (AFT) model. Extensive simulation studies show the advantage of the proposed method over the alternative methods. Promising findings are made in two case studies of lung cancer datasets with high dimensional gene expression measurements. Gene-environment (G×E) interactions are important for the elucidation of disease etiology beyond the main genetic and environmental effects. In the second project, a novel and powerful semi-parametric Bayesian variable selection model has been proposed to investigate linear and nonlinear G×E interactions simultaneously. It can further conduct structural identification by distinguishing nonlinear interactions from main-effects-only case within the Bayesian framework. The proposed method conducts Bayesian variable selection more efficiently and accurately than alternatives. Simulation shows that the proposed model outperforms competing alternatives in terms of both identification and prediction. In the case study, the proposed Bayesian method leads to the identification of effects with important implications in a high-throughput profiling study with high-dimensional SNP data. In the last project, a robust Bayesian variable selection method has been developed for G×E interaction studies. The proposed robust Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. Spike and slab priors are incorporated on both individual and group levels to identify the sparse main and interaction effects. Extensive simulation studies and analysis of both the diabetes data with SNP measurements from the Nurses' Health Study and TCGA melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives. To facilitate reproducible research and fast computation, we have developed open source R packages for each project, which provide highly efficient C++ implementation for all the proposed and alternative approaches. The R packages regnet and spinBayes, associated with the first and second project correspondingly, are available on CRAN. For the third project, the R package robin is available from GitHub and will be submitted to CRAN soon.

Design and Analysis of Clinical Trials for Predictive Medicine

Design and Analysis of Clinical Trials for Predictive Medicine PDF Author: Shigeyuki Matsui
Publisher: CRC Press
ISBN: 1466558164
Category : Mathematics
Languages : en
Pages : 394

Book Description
Design and Analysis of Clinical Trials for Predictive Medicine provides statistical guidance on conducting clinical trials for predictive medicine. It covers statistical topics relevant to the main clinical research phases for developing molecular diagnostics and therapeutics-from identifying molecular biomarkers using DNA microarrays to confirming

Variable Selection and Supervised Dimension Reduction for Large-Scale Genomic Data with Censored Survival Outcomes

Variable Selection and Supervised Dimension Reduction for Large-Scale Genomic Data with Censored Survival Outcomes PDF Author: Lauren Nicole Spirko
Publisher:
ISBN:
Category :
Languages : en
Pages : 189

Book Description
One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes, providing insight into the disease's process. With the rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of thousands of genes and proteins resulting in enormous data sets where the number of genomic variables (covariates) is far greater than the number of subjects. It is also typical for such data sets to have a high proportion of censored observations. Methods based on univariate Cox regression are often used to select genes related to survival outcome. However, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each gene. When applied to genes exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. In this thesis, we develop methods that will directly address t.

16th International Conference on Information Technology-New Generations (ITNG 2019)

16th International Conference on Information Technology-New Generations (ITNG 2019) PDF Author: Shahram Latifi
Publisher: Springer
ISBN: 3030140709
Category : Computers
Languages : en
Pages : 652

Book Description
This 16th International Conference on Information Technology - New Generations (ITNG), continues an annual event focusing on state of the art technologies pertaining to digital information and communications. The applications of advanced information technology to such domains as astronomy, biology, education, geosciences, security and health care are among topics of relevance to ITNG. Visionary ideas, theoretical and experimental results, as well as prototypes, designs, and tools that help the information readily flow to the user are of special interest. Machine Learning, Robotics, High Performance Computing, and Innovative Methods of Computing are examples of related topics. The conference features keynote speakers, the best student award, poster award, service award, a technical open panel, and workshops/exhibits from industry, government and academia.

Handbook of Statistics in Clinical Oncology, Third Edition

Handbook of Statistics in Clinical Oncology, Third Edition PDF Author: John Crowley
Publisher: CRC Press
ISBN: 1439862001
Category : Mathematics
Languages : en
Pages : 661

Book Description
Many new challenges have arisen in the area of oncology clinical trials. New cancer therapies are often based on cytostatic or targeted agents, which pose new challenges in the design and analysis of all phases of trials. The literature on adaptive trial designs and early stopping has been exploding. Inclusion of high-dimensional data and imaging techniques have become common practice, and statistical methods on how to analyse such data have been refined in this area. A compilation of statistical topics relevant to these new advances in cancer research, this third edition of Handbook of Statistics in Clinical Oncology focuses on the design and analysis of oncology clinical trials and translational research. Addressing the many challenges that have arisen since the publication of its predecessor, this third edition covers the newest developments involved in the design and analysis of cancer clinical trials, incorporating updates to all four parts: Phase I trials: Updated recommendations regarding the standard 3 + 3 and continual reassessment approaches, along with new chapters on phase 0 trials and phase I trial design for targeted agents. Phase II trials: Updates to current experience in single-arm and randomized phase II trial designs. New chapters include phase II designs with multiple strata and phase II/III designs. Phase III trials: Many new chapters include interim analyses and early stopping considerations, phase III trial designs for targeted agents and for testing the ability of markers, adaptive trial designs, cure rate survival models, statistical methods of imaging, as well as a thorough review of software for the design and analysis of clinical trials. Exploratory and high-dimensional data analyses: All chapters in this part have been thoroughly updated since the last edition. New chapters address methods for analyzing SNP data and for developing a score based on gene expression data. In addition, chapters on risk calculators and forensic bioinformatics have been added. Accessible to statisticians and oncologists interested in clinical trial methodology, the book is a single-source collection of up-to-date statistical approaches to research in clinical oncology.

Statistical Methods for High-dimensional Genomic Data

Statistical Methods for High-dimensional Genomic Data PDF Author: Michael Chiao-An Wu
Publisher:
ISBN:
Category :
Languages : en
Pages : 200

Book Description
High-throughput genomic studies hold great promise for providing insight into key biological and medical problems, but the high-dimensionality of the data from these studies constitutes a great challenge for researchers. This thesis seeks to address some of the methodological challenges posed by high-dimensional genomic data. First, the need to develop accurate classifiers based on genomic markers motivated the development of sparse linear discriminant analysis (sLDA), a regularized form of linear discriminant analysis, which performs simultaneous classification and variable selection. The second and third chapters of this thesis are concerned with multifeature testing. In the gene expression setting, we apply sLDA to test for differential expression of gene pathways by using the sLDA weights to reduce each pathway to a univariate score which may be evaluated via permutation. Then for genome wide association studies, we consider using the logistic kernel machine based testing framework to evaluate the significance of SNPs grouped on the basis of proximity to known genomic features. Finally, in the last chapter we study the use of sparse regularized regression for making inference in high dimensional data. Specifically, we develop a parametric permutation test based on the LASSO estimator for testing the effect of individual markers in "omics" settings.

Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics

Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics PDF Author: Christine Sinoquet
Publisher: OUP Oxford
ISBN: 0191019208
Category : Science
Languages : en
Pages : 415

Book Description
Nowadays bioinformaticians and geneticists are faced with myriad high-throughput data usually presenting the characteristics of uncertainty, high dimensionality and large complexity. These data will only allow insights into this wealth of so-called 'omics' data if represented by flexible and scalable models, prior to any further analysis. At the interface between statistics and machine learning, probabilistic graphical models (PGMs) represent a powerful formalism to discover complex networks of relations. These models are also amenable to incorporating a priori biological information. Network reconstruction from gene expression data represents perhaps the most emblematic area of research where PGMs have been successfully applied. However these models have also created renewed interest in genetics in the broad sense, in particular regarding association genetics, causality discovery, prediction of outcomes, detection of copy number variations, and epigenetics. This book provides an overview of the applications of PGMs to genetics, genomics and postgenomics to meet this increased interest. A salient feature of bioinformatics, interdisciplinarity, reaches its limit when an intricate cooperation between domain specialists is requested. Currently, few people are specialists in the design of advanced methods using probabilistic graphical models for postgenomics or genetics. This book deciphers such models so that their perceived difficulty no longer hinders their use and focuses on fifteen illustrations showing the mechanisms behind the models. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics covers six main themes: (1) Gene network inference (2) Causality discovery (3) Association genetics (4) Epigenetics (5) Detection of copy number variations (6) Prediction of outcomes from high-dimensional genomic data. Written by leading international experts, this is a collection of the most advanced work at the crossroads of probabilistic graphical models and genetics, genomics, and postgenomics. The self-contained chapters provide an enlightened account of the pros and cons of applying these powerful techniques.

High-Dimensional Covariance Estimation

High-Dimensional Covariance Estimation PDF Author: Mohsen Pourahmadi
Publisher: John Wiley & Sons
ISBN: 1118034295
Category : Mathematics
Languages : en
Pages : 204

Book Description
Methods for estimating sparse and large covariance matrices Covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. High-Dimensional Covariance Estimation provides accessible and comprehensive coverage of the classical and modern approaches for estimating covariance matrices as well as their applications to the rapidly developing areas lying at the intersection of statistics and machine learning. Recently, the classical sample covariance methodologies have been modified and improved upon to meet the needs of statisticians and researchers dealing with large correlated datasets. High-Dimensional Covariance Estimation focuses on the methodologies based on shrinkage, thresholding, and penalized likelihood with applications to Gaussian graphical models, prediction, and mean-variance portfolio management. The book relies heavily on regression-based ideas and interpretations to connect and unify many existing methods and algorithms for the task. High-Dimensional Covariance Estimation features chapters on: Data, Sparsity, and Regularization Regularizing the Eigenstructure Banding, Tapering, and Thresholding Covariance Matrices Sparse Gaussian Graphical Models Multivariate Regression The book is an ideal resource for researchers in statistics, mathematics, business and economics, computer sciences, and engineering, as well as a useful text or supplement for graduate-level courses in multivariate analysis, covariance estimation, statistical learning, and high-dimensional data analysis.