O'Kelly and Ratitch present this volume on interpreting clinicaltrial results when there is missing data, moving from less technical material intended to build common ground for clinicians andstatisticians to more advanced statistical technique. The first chapter lays groundwork of the conceptual issues with missing data,assumptions about both included and missing data such as randomness or monotonousness, and robustness testing. Prevention of missingdata, regulatory documentation, and planning ahead are discussed. The second half offers details on mixed models for repeated measures(MMRM), multiple imputations, analysis with lacunae assumed to be nonrandom, and doubly robust estimation. Chapters 1, 2, 4, 6, 7, and8 contain appendices with definitions, examples, and code, and key-points lists at the beginning of each chapter facilitate piecewise use by statisticians.Annotation ©2014 Ringgold, Inc., Portland, OR (protoview.com)
Preface xv
References xvii
Acknowledgments xix
Notation xxi
Table of SAS code fragments xxv
Contributors xxix
1 What’s the problem with missing data? 1
Michael O’Kelly and Bohdana Ratitch
1.1 What do we mean by missing data? 2
1.1.1 Monotone and non-monotone missing data 3
1.1.2 Modeling missingness, modeling the missing value and ignorability 4
1.1.3 Types of missingness (MCAR, MAR and MNAR) 4
1.1.4 Missing data and study objectives 5
1.2 An illustration 6
1.3 Why can’t I use only the available primary endpoint data? 7
1.4 What’s the problem with using last observation carried forward? 9
1.5 Can we just assume that data are missing at random? 11
1.6 What can be done if data may be missing not at random? 14
1.7 Stress-testing study results for robustness to missing data 15
1.8 How the pattern of dropouts can bias the outcome 15
1.9 How do we formulate a strategy for missing data? 16
1.10 Description of example datasets 18
1.10.1 Example dataset in Parkinson’s disease treatment 18
1.10.2 Example dataset in insomnia treatment 23
1.10.3 Example dataset in mania treatment 28
Appendix 1.A: Formal definitions of MCAR, MAR and MNAR 33
References 34
2 The prevention of missing data 36
Sara Hughes
2.1 Introduction 36
2.2 The impact of “too much” missing data 37
2.2.1 Example from human immunodeficiency virus 38
2.2.2 Example from acute coronary syndrome 38
2.2.3 Example from studies in pain 39
2.3 The role of the statistician in the prevention of missing data 39
2.3.1 Illustrative example from HIV 41
2.4 Methods for increasing subject retention 48
2.5 Improving understanding of reasons for subject withdrawal 49
Acknowledgments 49
Appendix 2.A: Example protocol text for missing data prevention 49
References 50
3 Regulatory guidance – a quick tour 53
Michael O’Kelly
3.1 International conference on harmonization guideline: Statistical principles for clinical trials: E9 54
3.2 The US and EU regulatory documents 55
3.3 Key points in the regulatory documents on missing data 55
3.4 Regulatory guidance on particular statistical approaches 57
3.4.1 Available cases 57
3.4.2 Single imputation methods 57
3.4.3 Methods that generally assume MAR 59
3.4.4 Methods that are used assuming MNAR 60
3.5 Guidance about how to plan for missing data in a study 62
3.6 Differences in emphasis between the NRC report and EU guidance documents 63
3.6.1 The term “conservative” 63
3.6.2 Last observation carried forward 63
3.6.3 Post hoc analyses 63
3.6.4 Non-monotone or intermittently missing data 63
3.6.5 Assumptions should be readily interpretable 65
3.6.6 Study report 65
3.6.7 Training 65
3.7 Other technical points from the NRC report 66
3.7.1 Time-to-event analyses 66
3.7.2 Tipping point sensitivity analyses 66
3.8 Other US/EU/international guidance documents that refer to missing data 66
3.8.1 Committee for medicinal products for human use guideline on anti-cancer products, recommendations on survival analysis 66
3.8.2 US guidance on considerations when research supported by office of human research protections is discontinued 67
3.8.3 FDA guidance on data retention 67
3.9 And in practice? 67
References 69
4 A guide to planning for missing data 71
Michael O’Kelly and Bohdana Ratitch
4.1 Introduction 72
4.1.1 Missing data may bias trial results or make them more difficult to generalize to subjects outside the trial 72
4.1.2 Credibility of trial results when there is missing data 74
4.1.3 Demand for better practice with regard to missing data 74
4.2 Planning for missing data 76
4.2.1 The case report form and non-statistical sections of the protocol 76
4.2.2 The statistical sections of the protocol and the statistical analysis plan 81
4.2.3 Using historic data to narrow the choice of primary analysis and sensitivity analyses 88
4.2.4 Key points in choosing an approach for missing data 108
4.3 Exploring and presenting missingness 113
4.4 Model checking 114
4.5 Interpreting model results when there is missing data 116
4.6 Sample size and missing data 117
Appendix 4.A: Sample protocol/SAP text for study in Parkinson’s disease 119
Appendix 4.B: A formal definition of a sensitivity parameter 125
References 126
5 Mixed models for repeated measures using categorical time effects (MMRM) 130
Sonia Davis
5.1 Introduction 131
5.2 Specifying the mixed model for repeated measures 132
5.2.1 The mixed model 132
5.2.2 Covariance structures 135
5.2.3 Mixed model for repeated measures versus generalized estimating equations 139
5.2.4 Mixed model for repeated measures versus last observation carried forward 140
5.3 Understanding the data 141
5.3.1 Parkinson’s disease example 141
5.3.2 A second example showing the usefulness of plots: The CATIE study 144
5.4 Applying the mixed model for repeated measures 145
5.4.1 Specifying the model 146
5.4.2 Interpreting and presenting results 150
5.5 Additional mixed model for repeated measures topics 162
5.5.1 Treatment by subgroup and treatment by site interactions 162
5.5.2 Calculating the effect size 164
5.5.3 Another strategy to model baseline 166
5.6 Logistic regression mixed model for repeated measures using the generalized linear mixed model 168
5.6.1 The generalized linear mixed model 168
5.6.2 Specifying the model 170
5.6.3 Interpreting and presenting results 173
5.6.4 Other modeling options 181
References 182
Table of SAS Code Fragments 183
6 Multiple imputation 185
Bohdana Ratitch
6.1 Introduction 185
6.1.1 How is multiple imputation different from single imputation? 186
6.1.2 How is multiple imputation different from maximum likelihood methods? 187
6.1.3 Multiple imputation’s assumptions about missingness mechanism 188
6.1.4 A general three-step process for multiple imputation and inference 189
6.1.5 Imputation versus analysis model 190
6.1.6 Note on notation use 192
6.2 Imputation phase 192
6.2.1 Missing patterns: Monotone and non-monotone 192
6.2.2 How do we get multiple imputations? 195
6.2.3 Imputation strategies: Sequential univariate versus joint multivariate 197
6.2.4 Overview of the imputation methods 199
6.2.5 Reusing the multiply-imputed dataset for different analyses or summary scales 212
6.3 Analysis phase: Analyzing multiple imputed datasets 213
6.4 Pooling phase: Combining results from multiple datasets 216
6.4.1 Combination rules 216
6.4.2 Pooling analyses of continuous outcomes 219
6.4.3 Pooling analyses of categorical outcomes 222
6.5 Required number of imputations 227
6.6 Some practical considerations 231
6.6.1 Choosing an imputation model 231
6.6.2 Multivariate normality 235
6.6.3 Rounding and restricting the range for the imputed values 238
6.6.4 Convergence of Markov chain Monte Carlo 240
6.7 Pre-specifying details of analysis with multiple imputation 244
Appendix 6.A: Additional methods for multiple imputation 245
References 251
Table of SAS Code Fragments 255
7 Analyses under missing-not-at-random assumptions 257
Michael O’Kelly and Bohdana Ratitch
7.1 Introduction 258
7.2 Background to sensitivity analyses and pattern-mixture models 259
7.2.1 The purpose of a sensitivity analysis 259
7.2.2 Pattern-mixture models as sensitivity analyses 261
7.3 Two methods of implementing sensitivity analyses via pattern-mixture models 264
7.3.1 A sequential method of implementing pattern-mixture models with multiple imputation 264
7.3.2 Providing stress-testing “what ifs” using pattern-mixture models 266
7.3.3 Two implementations of pattern-mixture models for sensitivity analyses 267
7.3.4 Characteristics and limitations of the sequential modeling method of implementing pattern-mixture models 268
7.3.5 Pattern-mixture models implemented using the joint modeling method 271
7.3.6 Characteristics of the joint modeling method of implementing pattern-mixture models 279
7.3.7 Summary of differences between the joint modeling and sequential modeling methods 281
7.4 A “toolkit”: Implementing sensitivity analyses via SAS 284
7.4.1 Reminder: General approach using multiple imputation with regression 284
7.4.2 Sensitivity analyses assuming withdrawals have trajectory of control arm 288
7.4.3 Sensitivity analyses assuming withdrawals have distribution of control arm 292
7.4.4 Baseline-observation-carried-forward-like and last-observation-carried-forward-like analyses 297
7.4.5 The general principle of using selected subsets of observed data as the basis to implement “what if” stress tests 306
7.4.6 Using a mixture of “what ifs,” depending on reason for discontinuation 306
7.4.7 Assuming trajectory of withdrawals is worse by some 𝛿: Delta adjustment and tipping point analysis 308
7.5 Examples of realistic strategies and results for illustrative datasets of three indications 320
7.5.1 Parkinson’s disease 320
7.5.2 Insomnia 323
7.5.3 Mania 330
Appendix 7.A How one could implement the neighboring case missing value assumption using visit-by-visit multiple imputation 335
Appendix 7.B SAS code to model withdrawals from the experimental arm, using observed data from the control arm 336
Appendix 7.C SAS code to model early withdrawals from the experimental arm, using the last-observation-carried-forward-like values 342
Appendix 7.D SAS macro to impose delta adjustment on a responder variable in the mania dataset 345
Appendix 7.E SAS code to implement tipping point via exhaustive scenarios for withdrawals in the mania dataset 346
Appendix 7.F SAS code to perform sensitivity analyses for the Parkinson’s disease dataset 348
Appendix 7.G SAS code to perform sensitivity analyses for the insomnia dataset 351
Appendix 7.H SAS code to perform sensitivity analyses for the mania dataset 356
Appendix 7.I Selection models 358
Appendix 7.J Shared parameter models 362
References 365
Table of SAS Code Fragments 368
8 Doubly robust estimation 369
Belinda Hern´andez, Ilya Lipkovich and Michael O’Kelly
8.1 Introduction 370
8.2 Inverse probability weighted estimation 370
8.2.1 Inverse probability weighting estimators for estimating equations 372
8.2.2 Summary of inverse probability weighting advantages 373
8.2.3 Inverse probability weighting disadvantages 373
8.3 Doubly robust estimation 374
8.3.1 Doubly robust methods explained 375
8.3.2 Advantages of doubly robust methods 376
8.3.3 Limitations of doubly robust methods 376
8.4 Vansteelandt et al. method for doubly robust estimation 377
8.4.1 Theoretical justification for the Vansteelandt et al. method 378
8.4.2 Implementation of the Vansteelandt et al. method for doubly robust estimation 379
8.5 Implementing the Vansteelandt et al. method via SAS 383
8.5.1 Mania dataset 383
8.5.2 Insomnia dataset 390
Appendix 8.A How to implement Vansteelandt et al. method for mania dataset (binary response) 392
Appendix 8.B SAS code to calculate estimates from the bootstrapped datasets 400
Appendix 8.C How to implement Vansteelandt et al. method for insomnia dataset 401
References 408
Table of SAS Code Fragments 408
Bibliography 409
Index 423