Statistician, University of Groningen
On needles and haystacks: Finding answers in complex health data
Abstract
In this era of automated systems, datasets are becoming increasingly complex: many (correlated) variables, multilevel structures, multiple outcomes, measurement error, missing values. Such big complex data provide new opportunities for health researchers to answer increasingly more complex research questions, for instance related to early detection and prevention of diseases and advancement of healthy ageing.
To be able to answer these increasingly more complex research questions, we need an inter-disciplinary research mentality and a sound methodological foundation. In my research, I combine these two aspects by developing Bayesian methodology. Bayesian methodology provides a framework for integrating different disciplines. The informative prior distributions used for this integration also provide the means of modelling the complex relations in health data. I will provide a few examples from health research which motivate the development of sophisticated Bayesian methodology.
Many health-related research questions often boil down to some form of variable selection – finding a few variables or group of variables in a complex dataset that are highly predictive of possibly multiple outcomes. I will present a recent research project in which I developed a Bayesian variable selection (BVS) methodology for food-borne disease outbreaks. The BVS methodology incorporates missing value imputation and a misclassification correction. In an application to the 2012 Salmonella Thompson outbreak in the Netherlands, the method is shown to outperform the frequentist alternatives.