Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems PDF full book. Access full book title Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems by Shixiao Zhang. Download full books in PDF and EPUB format.

Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems

Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems PDF Author: Shixiao Zhang
Publisher:
ISBN:
Category : Medical statistics
Languages : en
Pages : 119

Book Description
Missing data are ubiquitous in many social and medical studies. A naive complete-case (CC) analysis by simply ignoring the missing data commonly leads to invalid inferential results. This thesis aims to develop statistical methods addressing important issues concerning both missing data and casual inference problems. One of the major explored concepts in this thesis is multiple robustness, where multiple working models can be properly accommodated and thus to improve robustness against possible model misspecification. Chapter 1 serves as a brief introduction to missing data problems and causal inference. In this Chapter, we highlight two major statistical concepts we will repeatedly adopt in subsequent chapters, namely, empirical likelihood and calibration. We also describe some of the problems that will be investigated in this thesis. There exists extensive literature of using calibration methods with empirical likelihood in missing data and causal inference. However, researchers among different areas may not realize the conceptual similarities and connections with one another. In Chapter 2, we provide a brief literature review of calibration methods, aiming to address some of the desirable properties one can entertain by using calibration methods. In Chapter 3, we consider a simple scenario of estimating the means of some response variables that are subject to missingness. A crucial first step is to determine if the data are missing completely at random (MCAR), in which case a complete-case analysis would suffice. We propose a unified approach to testing MCAR and the subsequent estimation. Upon rejecting MCAR, the same set of weights used for testing can then be used for estimation. The resulting estimators are consistent if the missingness of each response variable depends only on a set of fully observed auxiliary variables and the true outcome regression model is among the user-specified functions for deriving the weights. The proposed testing procedure is compared with existing alternative methods which do not provide a method for subsequent estimation once the MCAR is rejected. In Chapter 4, we consider the widely adopted pretest-posttest studies in causal inference. The proposed test extends the existing methods for randomized trials to observational studies. We propose a dual method to testing and estimation of the average treatment effect (ATE). We also consider the potential outcomes are subject to missing at random (MAR). The proposed approach postulates multiple models for the propensity score of treatment assignment, the missingness probability and the outcome regression. The calibrated empirical probabilities are constructed through maximizing the empirical likelihood function subject to constraints deducted from carefully chosen population moment conditions. The proposed method is in a two-step fashion where the first step is to obtain the preliminary calibration weights that are asymptotically equivalent to the true propensity score of treatment assignment. Then the second step is to form a set of weights incorporating the estimated propensity score and multiple models for the missingness probability and the outcome regression. The proposed EL ratio test is valid and the resulting estimator is also consistent if one of the multiple models for the propensity score as well as one of the multiple models for the missingness probability or the outcome regression models are correctly specified. Chapter 5 extends Chapter 4's results to testing the equality of the cumulative distribution functions of the potential outcomes between the two intervention groups. We propose an empirical likelihood based Mann-Whitney test and an empirical likelihood ratio test which are multiply robust in the same sense as the multiply robust estimator and the empirical likelihood ratio test for the average treatment effect in Chapter 4. We conclude this thesis in Chapter 6 with some additional remarks on major results presented in the thesis along with several interesting topics worthy of further exploration in the future.

Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems

Multiply Robust Empirical Likelihood Inference for Missing Data and Causal Inference Problems PDF Author: Shixiao Zhang
Publisher:
ISBN:
Category : Medical statistics
Languages : en
Pages : 119

Book Description
Missing data are ubiquitous in many social and medical studies. A naive complete-case (CC) analysis by simply ignoring the missing data commonly leads to invalid inferential results. This thesis aims to develop statistical methods addressing important issues concerning both missing data and casual inference problems. One of the major explored concepts in this thesis is multiple robustness, where multiple working models can be properly accommodated and thus to improve robustness against possible model misspecification. Chapter 1 serves as a brief introduction to missing data problems and causal inference. In this Chapter, we highlight two major statistical concepts we will repeatedly adopt in subsequent chapters, namely, empirical likelihood and calibration. We also describe some of the problems that will be investigated in this thesis. There exists extensive literature of using calibration methods with empirical likelihood in missing data and causal inference. However, researchers among different areas may not realize the conceptual similarities and connections with one another. In Chapter 2, we provide a brief literature review of calibration methods, aiming to address some of the desirable properties one can entertain by using calibration methods. In Chapter 3, we consider a simple scenario of estimating the means of some response variables that are subject to missingness. A crucial first step is to determine if the data are missing completely at random (MCAR), in which case a complete-case analysis would suffice. We propose a unified approach to testing MCAR and the subsequent estimation. Upon rejecting MCAR, the same set of weights used for testing can then be used for estimation. The resulting estimators are consistent if the missingness of each response variable depends only on a set of fully observed auxiliary variables and the true outcome regression model is among the user-specified functions for deriving the weights. The proposed testing procedure is compared with existing alternative methods which do not provide a method for subsequent estimation once the MCAR is rejected. In Chapter 4, we consider the widely adopted pretest-posttest studies in causal inference. The proposed test extends the existing methods for randomized trials to observational studies. We propose a dual method to testing and estimation of the average treatment effect (ATE). We also consider the potential outcomes are subject to missing at random (MAR). The proposed approach postulates multiple models for the propensity score of treatment assignment, the missingness probability and the outcome regression. The calibrated empirical probabilities are constructed through maximizing the empirical likelihood function subject to constraints deducted from carefully chosen population moment conditions. The proposed method is in a two-step fashion where the first step is to obtain the preliminary calibration weights that are asymptotically equivalent to the true propensity score of treatment assignment. Then the second step is to form a set of weights incorporating the estimated propensity score and multiple models for the missingness probability and the outcome regression. The proposed EL ratio test is valid and the resulting estimator is also consistent if one of the multiple models for the propensity score as well as one of the multiple models for the missingness probability or the outcome regression models are correctly specified. Chapter 5 extends Chapter 4's results to testing the equality of the cumulative distribution functions of the potential outcomes between the two intervention groups. We propose an empirical likelihood based Mann-Whitney test and an empirical likelihood ratio test which are multiply robust in the same sense as the multiply robust estimator and the empirical likelihood ratio test for the average treatment effect in Chapter 4. We conclude this thesis in Chapter 6 with some additional remarks on major results presented in the thesis along with several interesting topics worthy of further exploration in the future.

Statistical Inferences for Missing Data/causal Inferences Based on Modified Empirical Likelihood

Statistical Inferences for Missing Data/causal Inferences Based on Modified Empirical Likelihood PDF Author: Sima Sharghi
Publisher:
ISBN:
Category : Estimation theory
Languages : en
Pages : 167

Book Description
In this dissertation we first modify profile empirical likelihood function conditioned on complete data to estimate the population mean in presence of missing values in the response variable. Also in Chapter 3 under the counterfactual potential outcome by Rubin (1974, 1976, 1977), we propose some methods to estimate causal effect. This dissertation specifically expands upon the work of Qin and Zhang (2007), as they fail to address two main shortcomings of their empirical likelihood utilization. The first flaw is when the estimation fails to exist. The second flaw is under- coverage probability of the confidence region. Both of these two flaws get exacerbated when the sample size is small.In Chapter 2, we modify the associated empirical likelihood function to obtain consistent estimators which address each of the shortcomings. Our adjusted-empirical-likelihood-based consistent estimator, using similar strategy to Chen et al. (2008), adds a point to the convex hull of the data to ensure the algorithm converges. Furthermore, inspired by Jing et al.2017, we propose a quadratic transformation to the associated empirical likelihood ratio test statistic to yield a consistent estimator with greater coverage probability.In Chapter 3 using the techniques developed in Chapter 2, adjusted empirical likelihood causal effect estimator which is consistent is developed.In Chapter 2 simulation study for estimating the mean response under the presence of missing values, both of our proposed estimators show competitive results compared with other historical method. These modified estimators generally outperform historical estimators in terms of RMSE and coverage probability. Chapter 3 simulations exhibit that the consistent adjusted empirical likelihood causal effect estimator is competitive compared to the historical methods.Along the way, we also propose a weighted adjusted empirical likelihood for both estimating the mean response, and causal effect, which is proved to be consistent under the presence of missing values in the response variable. This estimator exhibits competitive results compared with the empirical likelihood estimator proposed by Qin and Zhang (2007).

Empirical Likelihood Methods in Missing Response Problems and Causal Interference

Empirical Likelihood Methods in Missing Response Problems and Causal Interference PDF Author: Kaili Ren
Publisher:
ISBN:
Category : Causation
Languages : en
Pages : 114

Book Description
This manuscript contains three topics in missing data problems and causal inference. First, we propose an empirical likelihood estimator as an alternative to Qin and Zhang (2007) in missing response problems under MAR assumption. A likelihood-based method is used to obtain the mean propensity score instead of a moment-based method. Our proposed estimator shares the double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and the propensity score model are both correctly specified. Our proposed estimator has better performance when the propensity score is correctly specified. In addition, we extend our proposed method to the estimation of ATE in observational causal inferences. By utilizing the proposed method on a dataset from the CORAL clinical trial, we study the causal effect of cigarette smoking on renal function in patients with ARAS. The higher cystatin C and lower CKD-EPI GFR for smokers demonstrate the negative effect of smoking on renal function in patients with ARAS. Second, we explore a more efficient approach in missing response problems under MAR assumption. Instead of using one propensity score model and one working regression model, we postulate multiple working regression and propensity score models. Moreover, rather than maximizing the conditional likelihood, we maximize the full likelihood under constraints with respect to the postulated parametric functions. Our proposed estimator is consistent if one of the propensity scores is correctly specified and it achieves the semiparametric efficiency lower bound when one of the working regression models is correctly specified as well. This estimator is more efficient than other current estimators when one of the propensity scores is correctly specified. Finally, I propose empirical likelihood confidence intervals in missing data problems, which make very weak distribution assumptions. We show that the -2 empirical log-likelihood ratio function follows a scaled chi-squared distribution if either the working propensity score or the working regression model is correctly specified. If the two models are both correctly specified, the -2 empirical log-likelihood ratio function follows a chi-squared distribution. Empirical likelihood confidence intervals perform better than Wald confidence intervals of the AIPW estimator, when sample size is small and distribution of the response is highly skewed. In addition, empirical likelihood confidence intervals for ATE can also be built in causal inference.

Empirical Likelihood

Empirical Likelihood PDF Author: Art B. Owen
Publisher: CRC Press
ISBN: 1420036157
Category : Mathematics
Languages : en
Pages : 322

Book Description
Empirical likelihood provides inferences whose validity does not depend on specifying a parametric model for the data. Because it uses a likelihood, the method has certain inherent advantages over resampling methods: it uses the data to determine the shape of the confidence regions, and it makes it easy to combined data from multiple sources. It al

Empirical Likelihood Inference for Two-sample Problems

Empirical Likelihood Inference for Two-sample Problems PDF Author: Ying Yan
Publisher:
ISBN:
Category :
Languages : en
Pages : 40

Book Description
In this thesis, we are interested in empirical likelihood (EL) methods for two-sample problems, with focus on the difference of the two population means. A weighted empirical likelihood method (WEL) for two-sample problems is developed. We also consider a scenario where sample data on auxiliary variables are fully observed for both samples but values of the response variable are subject to missingness. We develop an adjusted empirical likelihood method for inference of the difference of the two population means for this scenario where missing values are handled by a regression imputation method. Bootstrap calibration for WEL is also developed. Simulation studies are conducted to evaluate the performance of naive EL, WEL and WEL with bootstrap calibration (BWEL) with comparison to the usual two-sample t-test in terms of power of the tests and coverage accuracies. Simulation for the adjusted EL for the linear regression model with missing data is also conducted.

Causal Inference with Covariate Balance Optimization

Causal Inference with Covariate Balance Optimization PDF Author: Yuying Xie
Publisher:
ISBN:
Category : Analysis of covariance
Languages : en
Pages :

Book Description
Causal inference is a popular problem in biostatistics, economics, and health science studies. The goal of this thesis is to develop new methods for the estimation of causal effects using propensity scores or inverse probability weights where weights are chosen in such a way to achieve balance in covariates across the treatment groups. In Chapter 1, we introduce Neyman-Rubin Causal framework and causal inference with propensity scores. The importance of covariate balancing in causal inference is furthered discussed in this chapter. Besides, some general definitions and notations for causal inference are provided with many other popular propensity score approaches or weighting techniques in Chapter 2. In Chapter 3, we describe a new model averaging approach to propensity score estimation in which parametric and nonparametric estimates are combined to achieve covariate balance. Simulation studies are conducted across different scenarios varying in the degree of interactions and nonlinearity in the treatment model. The results show that the proposed method produces less bias and smaller standard errors than existing approaches. They also show that a model averaging approach with the objective of minimizing the average Kolmogorov-Smirnov statistic leads to the best performance. The proposed approach is applied to a real data set in evaluating the causal effect of formula or mixed feeding versus exclusive breastfeeding in the first month of life on a child's BMI Z-score at age 4. The data analysis shows that formula or mixed feeding is more likely to lead to obesity at age 4, compared to exclusive breastfeeding. In Chapter 4, we propose using kernel distance to measure balance across different treatment groups and propose a new propensity score estimator by setting the kernel distance to be zero. Compared to other balance measures, such as absolute standardized mean difference (ASMD) and Kolmogorov Smirnov (KS) statistic, kernel distance is one of the best bias indicators in estimating the causal effect. That is, the balance metric based on kernel distance is shown to have the strongest correlation with the absolute bias in estimating the causal effect, compared to several commonly used balance metrics. The kernel distance constraints are solved by generalized method of moments. Simulation studies are conducted across different scenarios varying in the degree of nonlinearity in both the propensity score model and outcome model. The proposed approach produces smaller mean squared error in estimating causal treatment effects than many existing approaches including the well-known covariate balance propensity score (CBPS) approach when the propensity score model is misspecified. An application to data from the International Tobacco Control (ITC) policy evaluation project is provided. Often interest lies in the estimation of quantiles other than the average causal effect. Other quantities such as quantiles or the quantile treatment effect may be of interest. In Chapter 5, we propose a multiply robust method for estimating marginal quantiles of potential outcomes by achieving mean balance in (1) the propensity score, and (2) the conditional distributions of potential outcomes. An empirical likelihood or entropy measure can be utilized instead of using inverse probability weighting. Simulation studies are conducted across different scenarios of correctness in both the propensity score models and outcome models. Our estimator is consistent if any of the models are correctly specified.

Semiparametric and Robust Methods for Complex Parameters in Causal Inference

Semiparametric and Robust Methods for Complex Parameters in Causal Inference PDF Author: Wenjing Zheng
Publisher:
ISBN:
Category :
Languages : en
Pages : 169

Book Description
This dissertation focuses on developing robust semiparametric methods for complex parameters that emerge at the interface of causal inference and biostatistics, with applications to epidemiological and medical research. Specifically, it address three important topics: Part I (chapter 1) presents a framework to construct and analyze group sequential covariate-adjusted response-adaptive (CARA) randomized controlled trials (RCTs) that admits the use of data-adaptive approaches in constructing the randomization schemes and in estimating the conditional response model. This framework adds to the existing literature on CARA RCTs by allowing flexible options in both their design and analysis. Part II (chapters 2 and 3) concerns two parameters that arise in longitudinal causal effect analysis using marginal structural models (MSMs). Chapter 2 presents a targeted maximum likelihood estimator (TMLE) for the the dynamic MSM for the hazard function. This estimator improves upon the existing inverse probability weighted (IPW) estimators by providing efficiency gain and robustness protection against model misspecification. Chap- ter 3 addresses the issue of effect modification (in a MSM) by an effect modifier that is post exposure. This parameter is particularly relevant if an effect modifier of interest is missing at random; or if one wishes to evaluate the effect modification of a second-line-treatment by a post first-line-treatment variable, where assignment of the first-line-treatment shares common determinants with the outcome of interest. We also present a TMLE for this parameter. Part III (chapters 4 and 5) addresses semiparametric inference for mediation analysis. Chapter 4 presents a TMLE estimator for the natural direct and indirect effects in a one-time point setting; it improves upon existing estimators by offering robustness, weakened sensitivity to near positivity violations, and potential applications to situations with high-dimensional mediators. Chapter 5 studies longitudinal mediation analysis with time-varying exposure and mediators. In it, we propose a reformulation of the mediation problem in terms of stochastic interventions, establish an identification formula for the mediation functional, and present a TMLE for this parameter. This chapter contributes to existing literature by presenting a nonparametrically defined parameter of interest in longitudinal mediation and a multiply robust and efficient estimator for it. Chapter 1: An adaptive trial design allows pre-specified modifications to some aspects of the on-going trial based on analysis of the accruing data, while preserving the validity and integrity of the trial. This flexibility potentially translates into more efficient studies (e.g. shorter duration, fewer subjects) or greater chance of answering clinical questions of interest (e.g. detecting a treatment effect if one exists, broader does-response information, etc). In an adaptive CARA RCT, the treatment randomization schemes are allowed to depend on the patient's pre-treatment covariates, and the investigators have the opportunity to adjust these schemes during the course of the trial based on accruing information, including previous responses, in order to meet some pre-specified objectives. In a group-sequential CARA RCT, such adjustments take place at interim time points given by sequential inclusion of blocks of c patients, where c ≥ 1 is a pre-specified integer. In this chapter, we present a novel group-sequential CARA RCT design and corresponding analytical procedure that admits the use of flexible approaches in constructing randomization schemes and a wide range of data-adaptive techniques in estimating the conditional response model. Under the proposed framework, the sequence of randomization schemes is group-sequentially determined, using the accruing data, by targeting a formal, user- specified optimal randomization design. The parameter of interest is nonparametrically defined and is estimated using the paradigm of targeted minimum loss estimation. We establish that under appropriate empirical process conditions, the resulting sequence of randomization schemes converges to a fixed design, and the proposed estimator is consistent and asymptotically Gaussian, with an asymptotic variance that is estimable from data, thus giving rise to valid confidence intervals of given asymptotic levels. To illustrate the pro- posed framework, we consider LASSO regression in estimating the conditional outcome given treatment and baseline covariates. The asymptotic results ensue under minimal condition on the growth of the dimension of the regression coefficients and mild conditions on the complexity of the classes of randomization schemes. Chapter 2: In many applications, one is often interested in the effect of a longitudinal exposure on a time-to-event process. In particular, consider a study where subjects are followed over time; in addition to their baseline covariates, at various time points we also record their time-varying exposure of interest, time-varying covariates, and indicators for the event of interest (say death). Time varying confounding is ubiquitous in these situations: the exposure of interest depends on past covariates that confound the effect of the exposure on the outcome of interest, in turn exposure affects future confounders; right censoring may also be present in a study of this nature, often in response to past covariates and exposure. One way to assess the comparative effect of different regimens of interest is to study the hazard as a function of such regimens. The features of this hazard are often encoded in a marginal structural model. This chapter builds upon the work of Petersen, Schwab, Gruber, Blaser, Schomaker, and van der Laan (2014) to present a targeted maximum likelihood estimator for the marginal structural model for the hazard function under longitudinal dynamic interventions. The proposed estimator is efficient and doubly robust, hence offers an improvement over the incumbent IPW estimator. Chapter 3: A crucial component of comparative effectiveness research is evaluating the modification of an exposure's effect by a given set of baseline covariates (effect modifiers). In complex longitudinal settings where time-varying confounding exists, this effect modification analysis is often performed using a marginal structural model. Generally, the conditioning effect modifiers in a MSM are cast as variables of the observed past. Yet, in some applications the effect modifiers of interest are in fact counterfactual. For in- stance, for a specific value of the first-line treatment, one may wish to evaluate the effect modification of a second-line-treatment by a post first-line-treatment variable, wherein the first-line-treatment assignment shares common determinants with the outcome of interest. In this case a simple stratification on the first-line treatment will only yield effect modification over a subpopulation given by said determinants. Hence, the wished parameter of interest should be formulated in terms of randomization on first-line treatment as well. In another example, the effect modifiers may be subject to missingness, which may depend on other baseline confounders; a simple complete-case analysis may introduce selection bias due to the high correlation of these confounders with the missingness of the effect modifier. In this case, one would formulate the wish parameter of interest in terms of an intervention on missingness. We call these counterfactual effect modifiers. In such situations, analysis by stratification alone may harbor selection bias. In this chapter, we investigate MSM defined by counterfactual effect modifiers. Firstly, we determine the identification of the causal dose-response curve and MSM parameters in this setting. Secondly, we establish the semiparametric efficiency theory for these statistical parameters, and present a substitution-based, semiparametric efficient and doubly robust estimator us- ing the targeted maximum likelihood estimation methodology. However, as we shall see, due to the form of the efficient influence curve, the implementation of this estimator may prove arduous in applications where the effect modifier is high dimensional. To address this problem, our third contribution is a projected influence curve (and the corresponding TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence curve becomes taxing. In addition to these two robust estimators, we also present an IPW estimator, and a non-targeted G-computation estimator. Chapter 4: In many causal inference problems, one is interested in the direct causal effect of an exposure on an outcome of interest that is not mediated by certain intermediate variables. Robins and Greenland (1992) and Pearl (2001) formalized the definition of two types of direct effects (natural and controlled) under the counterfactual framework. The efficient influence curves (under a nonparametric model) for the various natural effect parameters and their general robustness conditions, as well as an estimating equation based estimator using the efficient influence curve, are provided in Tchetgen Tchetgen and Shpitser (2011a). In this chapter, we apply the targeted maximum likelihood frame- work to construct a semiparametric efficient, multiply robust, substitution estimator for the natural direct effect which satisfies the efficient influence curve equation derived in Tchetgen Tchetgen and Shpitser (2011a). We note that the robustness conditions in Tchetgen Tchetgen and Shpitser (2011a) may be weakened, thereby placing less reliance on the estimation of the mediator density. More.

In All Likelihood

In All Likelihood PDF Author: Yudi Pawitan
Publisher: Oxford University Press
ISBN: 9780198507659
Category : Business & Economics
Languages : en
Pages : 552

Book Description
This text concentrates on what can be achieved using the likelihood/Fisherian methods of taking into account uncertainty when studying a statistical problem. It takes the concept of the likelihood as the best method for unifying the demands of statistical modeling and theory of inference. Every likelihood concept is illustrated with realistic examples ranging from a simple comparison of two accident rates to complex studies that require generalized linear or semiparametric modeling. The emphasis is on likelihood not as just a device used to produce an estimate, but as an important tool for modeling.

Causal inference

Causal inference PDF Author: K. J. Rothman
Publisher: Kenneth Rothman
ISBN: 9780917227035
Category : Causation
Languages : en
Pages : 220

Book Description


Statistical Inference as Severe Testing

Statistical Inference as Severe Testing PDF Author: Deborah G. Mayo
Publisher: Cambridge University Press
ISBN: 1108563309
Category : Mathematics
Languages : en
Pages : 503

Book Description
Mounting failures of replication in social and biological sciences give a new urgency to critically appraising proposed reforms. This book pulls back the cover on disagreements between experts charged with restoring integrity to science. It denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run. If statistical consumers are unaware of assumptions behind rival evidence reforms, they can't scrutinize the consequences that affect them (in personalized medicine, psychology, etc.). The book sets sail with a simple tool: if little has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Many methods advocated by data experts do not stand up to severe scrutiny and are in tension with successful strategies for blocking or accounting for cherry picking and selective reporting. Through a series of excursions and exhibits, the philosophy and history of inductive inference come alive. Philosophical tools are put to work to solve problems about science and pseudoscience, induction and falsification.