Stijn Vansteelandt
Ghent University
& London School of Hygiene and Tropical Medicine
Title: Machine learning
for the evaluation of treatment effects: challenges, solutions and improvements
Abstract
The evaluation of treatment effects from observational
studies typically requires adjustment for high-dimensional confounding. This is
the result of a lack of comparability between treated and untreated subjects in
possibly many (pre-treatment) factors that are also related to outcome. While
such adjustment is routinely achieved via parametric modelling, it is not
entirely satisfactory as model misspecification is likely, and even relatively
minor misspecifications over the observed data range may induce large bias in
the treatment effect estimate. Over the past 2 decades, there has therefore
been growing interest in the use of machine learning methods to assist this
task. This is not surprising if one considers the enormous contributions that
the machine learning literature has offered on how to predict outcomes based on
possibly high-dimensional predictors or features. In this talk, I will
therefore focus on the use of machine learning for the evaluation of (causal)
treatment effects. This turns out to be a challenging task: while the
prediction performance of a given machine learning algorithm can be measured by
contrasting observed and predicted outcomes, such evaluation becomes impossible
when machine learning is used for treatment effect estimation since the true
treatment effect is always unknown. In this talk, I will demonstrate
that naive use of existing machine learning algorithms is problematic for
treatment evaluation and explain why that is the case. I will next give a
gentle introduction to pioneering work on Targeted Learning and on Double
Machine Learning, and will discuss improvements that we have made to these
techniques. Throughout the talk, machine learning will be
considered in the broad sense as any algorithm that uses data to learn a
proper model for the data, thus including (though not being limited to) routine
variable selection procedures. The talk is based on joint work with Oliver
Dukes (Ghent University) and will be accessible to attendees without a detailed
understanding of machine learning algorithms.
CV
Stijn Vansteelandt is
Professor of Statistics at Ghent University (Belgium) and Professor of
Statistical Methodology at the London School of Hygiene and Tropical Medicine
(UK). As a causal inference expert, he primarily develops methods for causal
machine learning, mediation analysis, time-varying confounding control, and for
handling intercurrent events in randomised
experiments. He has authored over 150 peer-reviewed publications in
international journals on a variety of topics in biostatistics, epidemiology
and medicine, such as the analysis of longitudinal and clustered data, missing
data, mediation and moderation/interaction, instrumental variables,
family-based genetic association studies, analysis of outcome-dependent
samples, phylogenetic inference, meta-analysis, post-selection inference and
interim analysis. He is currently Associate Editor of the Journal of the Royal
Statistical Society (Series B) and has previously served as Co-Editor of
Biometrics, the leading flagship journal of the International Biometrics
Society, and as Associate Editor for the journals Biometrics, Biostatistics,
Epidemiology, Epidemiologic Methods and the Journal of Causal Inference.