Clinical Prediction Models Description
Prefacevii
Acknowledgementsxi
Chapter 1Introduction1
1.1Diagnosis, prognosis and therapy choice in medicine1
1.1.1Predictions for personalized evidence-based medicine1
1.2Statistical modeling for prediction5
1.2.1Model assumptions5
1.2.2Reliability of predictions: aleatory and epistemic uncertainty6
1.2.3Sample size6
1.3Structure of the book8
1.3.1Part I: Prediction models in medicine8
1.3.2Part II: Developing internally valid prediction models8
1.3.3Part III: Generalizability of prediction models9
1.3.4Part IV: Applications9
Part I: Prediction models in medicine11
Chapter 2Applications of prediction models13
2.1Applications: medical practice and research13
2.2Prediction models for Public Health14
2.2.1Targeting of preventive interventions14
*2.2.2Example: prediction for breast cancer14
2.3Prediction models for clinical practice17
2.3.1Decision support on test ordering17
*2.3.2Example: predicting renal artery stenosis17
2.3.3Starting treatment: the treatment threshold20
*2.3.4Example: probability of deep venous thrombosis20
2.3.5Intensity of treatment21
*2.3.6Example: defining a poor prognosis subgroup in cancer22
2.3.7Cost-effectiveness of treatment23
2.3.8Delaying treatment23
*2.3.9Example: spontaneous pregnancy chances24
2.3.10Surgical decision-making26
*2.3.11Example: replacement of risky heart valves27
2.4Prediction models for medical research28
2.4.1Inclusion and stratification in a RCT28
*2.4.2Example: selection for TBI trials29
2.4.3Covariate adjustment in a RCT30
2.4.4Gain in power by covariate adjustment31
*2.4.5Example: analysis of the GUSTO-III trial32
2.4.6Prediction models and observational studies32
2.4.7Propensity scores33
*2.4.8Example: statin treatment effects34
2.4.9Provider comparisons35
*2.4.10Example: ranking cardiac outcome35
2.5Concluding remarks35
Chapter 3Study design for prediction modeling37
3.1Studies for prognosis37
3.1.1Retrospective designs37
*3.1.2Example: predicting early mortality in esophageal cancer37
3.1.3Prospective designs38
*3.1.4Example: predicting long-term mortality in esophageal cancer39
3.1.5Registry data39
*3.1.6Example: surgical mortality in esophageal cancer39
3.1.7Nested case-control studies40
*3.1.8Example: perioperative mortality in major vascular surgery40
3.2Studies for diagnosis41
3.2.1Cross-sectional study design and multivariable modeling41
*3.2.2Example: diagnosing renal artery stenosis41
3.2.3Case-control studies41
*3.2.4Example: diagnosing acute appendicitis42
3.3Predictors and outcome42
3.3.1Strength of predictors42
3.3.2Categories of predictors42
3.3.3Costs of predictors43
3.3.4Determinants of prognosis44
3.3.5Prognosis in oncology44
3.4Reliability of predictors45
3.4.1Observer variability45
*3.4.2Example: histology in Barretts esophagus45
3.4.3Biological variability46
3.4.4Regression dilution bias46
*3.4.5Example: simulation study on reliability of a binary predictor46
3.4.6Choice of predictors47
3.5Outcome47
3.5.1Types of outcome47
3.5.2Survival endpoints48
*3.5.3Examples: 5-year relative survival in cancer registries48
3.5.4Composite endpoints49
*3.5.5Example: composite endpoints in cardiology49
3.5.6Choice of prognostic outcome49
3.5.7Diagnostic endpoints49
*3.5.8Example: PET scans in esophageal cancer50
3.6Phases of biomarker development50
3.7Statistical power and reliable estimation51
3.7.1Sample size to identify predictor effects51
3.7.2Sample size for reliable modeling53
3.7.3Sample size for reliable validation55
3.8Concluding remarks55
Chapter 4Statistical models for prediction57
4.1Continuous outcomes57
*4.1.1Examples of linear regression58
4.1.2Economic outcomes58
*4.1.3Example: prediction of costs58
4.1.4Transforming the outcome58
4.1.5Performance: explained variation59
4.1.6More flexible approaches60
4.2Binary outcomes61
4.2.1R2 in logistic regression analysis62
4.2.2Calculation of R2 on the log likelihood scale63
4.2.3Models related to logistic regression65
4.2.4Bayes rule65
4.2.5Prediction with Naïve Bayes66
4.2.6Calibration and Naïve Bayes67
*4.2.7Logistic regression and Bayes67
4.2.8Machine learning: more flexible approaches68
4.2.9Classification and regression trees69
*4.2.10Example: mortality in acute MI patients69
4.2.11Advantages and disadvantages of tree models70
4.2.12Trees versus logistic regression modeling70
*4.2.13Other methods for binary outcomes71
4.2.14Summary on binary outcomes72
4.3Categorical outcomes73
4.3.1Polytomous logistic regression73
4.3.2Example: histology of residual masses73
*4.3.3Alternative models75
*4.3.4Comparison of modeling approaches76
4.4Ordinal outcomes77
4.4.1Proportional odds logistic regression77
* 4.4.2Relevance of the proportional odds assumption in RCTs78
4.5Survival outcomes80
4.5.1Cox proportional hazards regression80
4.5.2Prediction with Cox models81
4.5.3Proportionality assumption81
4.5.4Kaplan-Meier analysis81
*4.5.5Example: impairment after treatment of leprosy82
4.5.6Parametric survival82
*4.5.7Example: replacement of risky heart valves83
4.5.8Summary on survival outcomes83
4.6Competing risks84
4.6.1Actuarial and actual risks84
4.6.2Absolute risk and the Fine&Gray model84
4.6.3Example: Prediction of coronary heart disease incidence85
4.6.4Multi-state modeling86
4.7Dynamic predictions87
4.7.1Multi-state models and landmarking87
4.7.2Joint models87
4.8Concluding remarks88
Chapter 5Overfitting and optimism in prediction models91
5.1Overfitting and optimism91
5.1.1Example: surgical mortality in esophagectomy92
5.1.2Variability within one center92
5.1.3Variability between centers: noise vs. true heterogeneity93
5.1.4Predicting mortality by center: shrinkage94
5.2Overfitting in regression models95
5.2.1Model uncertainty and testimation bias95
5.2.2Other modeling biases97
5.2.3Overfitting by parameter uncertainty97
5.2.4Optimism in model performance98
5.2.5Optimism-corrected performance99
5.3Bootstrap resampling100
5.3.1Applications of the bootstrap101
5.3.2Bootstrapping for regression coefficients102
5.3.3Bootstrapping for prediction: optimism correction102
5.3.4Calculation of optimism-corrected performance103
*5.3.5Example: Stepwise selection in 429 patients104
5.4Cost of data analysis105
*5.4.1Degrees of freedom of a model105
5.4.2Practical implications105
5.5Concluding remarks106
Chapter 6Choosing between alternative models109
6.1Prediction with statistical models109
6.1.1Testing of model assumptions and prediction110
6.1.2Choosing a type of model110
6.2Modeling age outcome relations111
*6.2.1Age and mortality after acute MI111
*6.2.2Age and operative mortality112
*6.2.3Age outcome relations in other diseases115
6.3Head-to-head comparisons116
6.3.1StatLog results116
*6.3.2Cardiovascular disease prediction comparisons117
*6.3.3Traumatic brain injury modeling results119
6.4Concluding remarks120
Part II: Developing valid prediction models123
Checklist for developing valid prediction models124
Chapter 7Missing values125
7.1Missing values and prediction research125
7.1.1Inefficiency of complete case analysis126
7.1.2Interpretation of CC Analyses127
7.1.3Missing data mechanisms127
7.1.4Missing outcome data128
7.1.5Summary points129
7.2Prediction under MCAR, MAR and MNAR mechanisms130
7.2.1Missingness patterns130
7.2.2Missingness and estimated regression coefficients132
7.2.4Missingness and estimated performance134
7.3Dealing with missing values in regression analysis135
7.3.1Imputation principle135
7.3.2Simple and more advanced single imputation methods136
7.3.3Multiple imputation137
7.4Defining the imputation model138
7.4.1Types of variables in the imputation model138
*7.4.2Transformations of variables139
7.4.3Imputation models for SI139
7.4.4Summary points139
7.5Success of imputation under MCAR, MAR and MNAR140
7.5.1Imputation in a simple model140
7.5.2Other simulation results140
* 7.5.3Multiple predictors140
7.6Guidance to dealing with missing values in prediction research142
7.6.1Patterns of missingness142
7.6.2Simple approaches143
7.6.3More advanced approaches143
7.6.4Maximum fraction of missing values before omitting a predictor143
7.6.5Single or multiple imputation for predictor effects?144
7.6.6Single or multiple imputation for deriving predictions?145
7.6.7Missings and predictions for new patients145
*7.6.8Performance across multiple imputed data sets146
7.6.9Reporting of missing values in prediction research146
7.7Concluding remarks148
7.7.1Summary statements148
*7.7.2Available software and challenges149
Chapter 8 Case study on dealing with missing values151
8.1Introduction151
8.1.1Aim of the IMPACT study151
8.1.2Patient selection152
8.1.3Potential predictors152
8.1.4Coding and time dependency of predictors153
8.2Missing values in the IMPACT study153
8.2.1Missing values in outcome153
8.2.2Quantification of missingness of predictors154
8.2.3Patterns of missingness156
8.3Imputation of missing predictor values159
8.3.1Correlations between predictors159
8.3.2Imputation model160
8.3.3Distributions of imputed values160
*8.3.4Multilevel imputation161
8.4Predictor effect: adjusted analyses162
8.4.1Adjusted analysis for complete predictors: age and motor score163
8.4.2Adjusted analysis for incomplete predictors: pupils165
8.5Predictions: multivariable analyses165
*8.5.1Multilevel analyses166
8.6Concluding remarks166
Chapter 9Coding of categorical and continuous predictors169
9.1Categorical predictors169
9.1.1 Examples of categorical coding170
9.2Continuous predictors171
*9.2.1Examples of continuous predictors171
9.2.2Categorization of continuous predictors172
9.3Non-linear functions for continuous predictors173
9.3.1.Polynomials173
9.3.2.Fractional polynomials (FP)174
9.3.3Splines175
*9.3.4Example: functional forms with RCS or FP176
9.3.5Extrapolation and robustness176
9.3.5Preference for FP or RCS?176
9.4Outliers and winsorizing177
9.4.1Example: glucose values and outcome of TBI178
9.5Interpretation of effects of continuous predictors180
*9.5.1Example: predictor effects in TBI181
9.6Concluding remarks182
9.6.1Software183
Chapter 10Restrictions on candidate predictors185
10.1Selection before studying the predictor outcome relation185
10.1.1Selection based on subject knowledge185
*10.1.2Examples: too many candidate predictors185
10.1.3Meta-analysis for candidate predictors186
*10.1.4Example: predictors in testicular cancer186
10.1.5Selection based on distributions186
10.2Combining similar variables187
10.2.1 Subject knowledge for grouping187
10.2.2Assessing the equal weights assumption188
10.2.3Biologically motivated weighting schemes189
10.2.4Statistical combination189
10.3Averaging effects190
*10.3.1Example: Chlamydia trachomatis infection risks190
*10.3.2Example: acute surgery risk relevant for elective patients?190
*10.4Case study: family history for prediction of a genetic mutation191
10.4.1Clinical background and patient data191
10.4.2Similarity of effects191
10.4.3CRC and adenoma in a proband194
10.4.5Full prediction model for mutations196
10.5Concluding remarks197
Chapter 11Selection of main effects199
11.1Predictor selection199
11.1.1Reduction before modeling199
11.1.2Reduction while modeling200
11.1.3Collinearity200
11.1.4Parsimony200
11.1.5Non-significant candidate predictors201
11.1.6Summary points on predictor selection201
11.2Stepwise selection202
11.2.1Stepwise selection variants202
11.2.2Stopping rules in stepwise selection202
11.3Advantages of stepwise methods203
11.4Disadvantages of stepwise methods204
11.4.1Instability of selection204
11.4.2Testimation: Biased in selected coefficients206
*11.4.3Testimation: empirical illustrations207
11.4.4Misspecification of variability and p-values208
11.5Influence of noise variables210
11.6Univariate analyses and model specification211
11.6.1Pros and cons of univariate pre-selection211
*11.6.2Testing of predictors in a domain212
11.7Modern selection methods212
*11.7.1Bootstrapping for selection212
*11.7.2Bagging and boosting212
*11.7.3Bayesian model averaging (BMA)213
11.7.4Shrinkage of regression coefficients to zero213
11.8Concluding remarks214
Chapter 12Assumptions in regression models: Additivity and linearity217
12.1Additivity and interaction terms217
12.1.1Potential interaction terms to consider218
12.1.2Interactions with treatment218
12.1.3Other potential interactions219
*12.1.4Example: time and survival after valve replacement220
12.2Selection, estimation and performance with interaction terms220
12.2.1Example: age interactions in GUSTO-I220
12.2.2Estimation of interaction terms221
12.2.3Better prediction with interaction terms?222
12.2.4Summary points223
12.3Non-linearity in multivariable analysis223
12.3.1Multivariable restricted cubic splines (rcs)224
12.3.2Multivariable fractional polynomials (FP)225
12.3.3Multivariable splines in gam225
12.4Example: non-linearity in testicular cancer case study226
*12.4.1Details of multivariable FP and gam analyses227
*12.4.2GAM in univariate and multivariable analysis228
*12.4.3Predictive performance229
*12.4.4R code for non-linear modeling in testicular cancer example230
12.5 Concluding remarks230
12.5.1Recommendations231
Chapter 13Modern estimation methods233
13.1Predictions from regression and other models233
*13.1.1Estimation with other modeling approaches234
13.2Shrinkage234
13.2.1Uniform shrinkage235
13.2.2Uniform shrinkage: illustration236
13.3Penalized estimation236
*13.3.1Penalized maximum likelihood estimation237
13.3.2Penalized ML: illustration238
*13.3.3Optimal penalty by bootstrapping238
13.3.4Firth regression239
*13.3.5Firth regression: illustration239
*13.4.1Estimation of a LASSO model240
13.5Elastic net241
*13.5.1Estimation of Elastic Net model241
13.6Performance after shrinkage242
13.6.1Shrinkage, penalization, and model selection242
13.7Concluding remarks244
Chapter 14Estimation with external information247
Background247
14.1Combining literature and individual patient data (IPD)247
14.1.1A global prediction model248
*14.1.2A global model for traumatic brain injury249
14.1.3Developing a local prediction model249
14.1.4Adaptation of univariate coefficients250
*14.1.5Adaptation method 1250
*14.1.6Adaptation method 2251
*14.1.7Estimation of adaptation factors251
*14.1.8Simulation results252
14.1.9 Performance of the adapted model253
14.2Case study: prediction model for AAA surgical mortality254
14.2.1Meta-analysis254
14.2.2Individual patient data analysis255
14.2.3 Adaptation and clinical presentation256
14.3Alternative approaches257
14.3.1Overall calibration257
14.3.2Stacked regressions257
14.3.3Bayesian methods: using data priors to regression modeling257
14.3.4Example: predicting neonatal death258
*14.3.5Example: aneurysm study258
14.4Concluding remarks258
Chapter 15Evaluation of performance261
15.1Overall performance measures261
15.1.1Explained variation: R2261
15.1.2Brier score262
15.1.3 Performance of testicular cancer prediction model263
15.3.4Assessment of moderate calibration283
15.3.5Assessment of strong calibration283
15.3.6Calibration of survival predictions284
15.3.7Example: calibration in testicular cancer prediction model285
*15.3.8R code for assessing calibration286
15.3.9Calibration and discrimination286
15.4Concluding remarks287
15.4.1Bibliographic notes287
Chapter 16Evaluation of clinical usefulness289
16.1Clinical usefulness289
16.1.1Intuitive approach to the cutoff290
16.1.2Decision-analytic approach: benefit vs harm290
16.1.3Accuracy measures for clinical usefulness291
16.1.4Decision curve analysis292
16.1.5Interpreting net benefit in decision curves293
16.1.6Example: clinical usefulness of prediction in testicular cancer295
16.1.7Decision curves for testicular cancer example296
16.1.8Verification bias and clinical usefulness297
*16.1.9R code298
16.2Discrimination, calibration, and clinical usefulness300
16.2.1 Discrimination, calibration, and Net Benefit in the testicular cancer case study300
16.2.2Aims of prediction models and performance measures301
16.2.2Summary points302
16.3From prediction models to decision rules303
16.3.1Performance of decision rules303
16.3.2Treatment benefit in prognostic subgroups305
16.3.3Evaluation of classification systems305
16.4Concluding remarks306
Chapter 17Validation of prediction models309
17.1Internal versus external validation, and validity309
17.1.1 Assessment of internal and external validity310
17.2Internal validation techniques311
17.2.1Apparent validation311
17.2.3Cross-validation313
17.2.4Bootstrap validation314
17.2.5 Internal validation combined with imputation315
17.3External validation studies315
17.3.1Temporal validation316
*17.3.2Example: validation of a model for Lynch syndrome316
17.3.3Geographic validation317
17.3.4Fully independent validation319
17.3.5Reasons for poor validation320
17.4Concluding remarks321
Chapter 18 Presentation formats323
18.1Prediction models versus decision rules323
18.2Clinical prediction models325
18.2.1Regression formulas325
18.2.2Confidence intervals for predictions326
18.2.3Nomograms327
18.2.4Score chart329
18.2.5Tables with predictions330
18.2.6Specific formats331
18.2.7Black box presentations331
18.3Case study: clinical prediction model for testicular cancer model333
18.3.1Regression formula from logistic model333
18.3.2Nomogram334
*18.3.3Score chart334
18.3.4Summary points335
18.4Clinical decision rules335
18.4.1Regression tree335
18.4.2Score chart rule335
18.4.3Survival groups336
18.4.4Meta-model337
18.5Concluding remarks338
Part III: Generalizability of prediction models341
Chapter 19Patterns of external validity343
19.1Determinants of external validity343
19.1.1 Case-mix343
19.1.2 Differences in case-mix343
19.1.3 Differences in regression coefficients344
19.2.1 Simulation set-up345
19.2.2 Performance measures347
19.3Distribution of predictors348
19.3.1 More or less severe case-mix according to X348
*19.3.2Interpretation of testicular cancer validation349
19.3.3 More or less heterogeneous case-mix according to X349
19.3.4 More or less severe case-mix according to Z350
19.3.5 More or less heterogeneous case-mix according to Z351
19.4Distribution of observed outcome y353
19.5Coefficients ß354
19.5.1 Coefficient of linear predictor < 1354
19.5.2 Coefficients ß different355
19.6Summary of patterns of invalidity356
19.6.1 Other scenarios of invalidity357
19.7Reference values for performance358
19.7.1 Model-based performance: performance if the model is valid358
19.7.2 Performance with refitting358
*19.7.3Examples: testicular cancer and TBI359
*19.7.4R code360
19.8Limited validation sample size361
19.8.1 Uncertainty in validation of performance361
*19.8.2Estimating standard errors in validation studies363
19.8.3 Summary points363
19.9Design of external validation studies363
19.9.1 Power of external validation studies364
*19.9.2Calculating sample sizes for validation studies365
19.9.3 Rules for sample size of validation studies366
19.9.4 Summary points367
19.10Concluding remarks368
Chapter 20Updating for a new setting371
20.1Updating only the intercept372
20.1.1 Simple updating methods372
20.2Approaches to more extensive updating372
20.2.1 Eight updating methods for predicting binary outcomes373
20.3Validation and updating in GUSTO-I375
20.3.1 Validity of TIMI-II model for GUSTO-I376
20.3.2 Updating the TIMI-II model for GUSTO-I377
20.3.3 Performance of updated models378
*20.3.4R code for updating methods379
20.4Shrinkage and updating379
20.4.1 Shrinkage towards recalibrated values in GUSTO-I380
*20.4.2R code for shrinkage and penalization in updating381
20.4.4 Bayesian updating382
20.5Sample size and updating strategy383
*20.5.1Simulations of sample size, shrinkage, and updating strategy384
20.5.2 A closed test for the choice of updating strategy386
20.6Validation and updating of tree models386
20.7Validation and updating of survival models388
*20.7.1Validation of a simple index for non-Hodgkin's lymphoma388
20.7.2 Updating the prognostic index389
20.7.3 Recalibration for groups by time points389
20.7.4 Recalibration with a Cox or Weibull regression model390
20.7.6 Summary points391
20.8Continuous updating392
*20.8.1Precision and updating strategy392
*20.8.2Continuous updating in GUSTO-I393
*20.8.3Other dynamic modeling approaches394
20.9Concluding remarks396
*20.9.1Further illustrations of updating397
Chapter 21Updating for multiple settings401
21.1Differences in outcome401
21.1.1 Testing for calibration-in-the large401
*21.1.2Illustration of heterogeneity in GUSTO-I402
21.1.3 Updating for better calibration-in-the large403
21.1.4 Empirical Bayes estimates403
*21.1.5Illustration of updating in GUSTO-I404
21.1.6 Testing and updating of predictor effects405
*21.1.7Heterogeneity of predictor effects in GUSTO-I405
*21.1.8R code for random effect analyses in GUSTO-I405
21.2Provider profiling406
21.2.1 Ranking of centers: the expected rank407
*21.2.2Example: provider profiling in stroke408
*21.2.4Estimation and interpreting differences between centers409
*21.2.5Ranking of centers410
*21.2.6R code for provider profiling411
21.3Concluding remarks412
*21.3.1Further literature413
Part IV: Applications415
Chapter 22Case study on a prediction of 30-day mortality417
22.1GUSTO-I study417
22.1.1 Acute myocardial infarction417
*22.1.2Treatment results from GUSTO-I418
22.1.3 Prognostic modeling in GUSTO-I418
22.2General considerations of model development421
22.2.1 Research question and intended application421
22.2.2 Outcome and predictors421
22.2.3 Study design and analysis421
22.3Seven modeling steps in GUSTO-I423
22.3.1 Preliminary423
22.3.2 Coding of predictors423
22.3.3 Model specification423
22.3.4 Model estimation423
22.3.5 Model performance424
22.3.6 Model validation424
22.3.7 Presentation425
22.3.8 Predictions426
22.4Validity428
22.4.1 Internal validity: overfitting428
22.4.2 External validity: generalizability428
22.4.3 Summary points429
22.5Translation into clinical practice429
22.5.1 Score chart for choosing thrombolytic therapy429
22.5.2 From predictions to decisions430
22.6Concluding remarks432
Chapter 23Case study on survival analysis: prediction of cardiovascular events435
23.1Prognosis in the SMART study435
*23.1.1 Patients in SMART436
23.2 General considerations in SMART438
23.2.1 Research question and intended application438
23.2.2 Outcome and predictors438
23.2.3 Study design and analysis438
23.3Preliminary modeling steps in the SMART cohort440
23.3.1 Patterns of missing values440
23.3.2 Imputation of missing values441
23.3.3 R code442
23.4Coding of predictors443
23.4.1 Extreme values443
23.4.2 Transforming continuous predictors444
23.4.3 Combining predictors with similar effects445
23.4.4 R code446
23.5.1 A full model447
23.5.2 Impact of imputation449
23.5.3 R code for full model and imputation variants449
23.6Model selection and estimation451
23.6.1 Stepwise selection451
23.6.2 LASSO for selection with imputed data452
23.7Model performance and internal validation453
23.7.1 Estimation of optimism in performance453
23.7.2 Model presentation456
23.7.3 R code for presentations457
23.8Concluding remarks458
Chapter 24Overall lessons and data sets461
24.1Sample size461
24.1.1Model selection, estimation, and sample size462
24.1.2Calibration improvement by penalization463
24.1.3Poorer performance with more predictors464
24.1.4Model selection with noise predictors465
24.1.5Potential solutions466
24.1.6R code for model selection and penalization466
24.2Validation467
24.2.1Examples of internal and external validation467
24.3Subject matter knowledge versus machine learning468
24.3.1 Exploiting subject matter knowledge468
24.3.2 Machine learning and Big Data470
24.4Reporting of prediction models and risk of bias assessments470
24.4.1 Reporting guidelines470
24.4.2 Risk of bias assessment472
24.5Data sets473
24.5.1 GUSTO-I prediction models473
24.5.2 SMART case study475
24.5.3 Testicular cancer case study476
24.5.4 Abdominal aortic aneurysm case study478
24.6Concluding remarks481
References483