On Friday March 9, the Van Dantzig Seminar on statistics will host lectures by Jianqing Fan (Princeton University) and David Dunson (Duke University).
The program is (titles and abstracts below):
14.30 – 14.35 opening
14.35 – 15.35 Edward George (Wharton University of Pennsylvania)
15.35 – 15.50 break
15.50 – 16.50 Barry Schouten (Utrecht University)
16.50 – drinks
Location: University of Amsterdam, room A1.04 in Science Park 904. Free attendance.
The Van Dantzig seminar is a nationwide series of lectures in statistics, that features renowned international and local speakers from the full breadth of the statistical sciences. The name honours David van Dantzig (1900-1959), who was the first modern statistician in the Netherlands, and professor in the “Theory of Collective Phenomena” (i.e. statistics) in Amsterdam. The seminar will convene 4 to 6 times a year at varying locations, and is financially supported by, among others, the STAR cluster and the Section Mathematical Statistics of the VVS-OR.
TITLES AND ABSTRACTS OF PRESENTATIONS
Edward George
Wharton University of Pennsylvania
Mortality Rate Estimation and Standardization for Public Reporting: Medicare’s Hospital Compare
Bayesian models are increasingly fit to large administrative data sets and then used to make individualized recommendations. In particular, Medicare’s Hospital Compare webpage provides information to patients about specific hospital mortality rates for a heart attack or Acute Myocardial Infarction (AMI). Hospital Compare’s current recommendations are based on a random-effects logit model with a random hospital indicator and patient risk factors. Except for the largest hospitals, these recommendations or predictions are not individually checkable against data, because data from smaller hospitals are too limited. Before individualized Bayesian recommendations, people derived general advice from empirical studies of many hospitals; e.g., prefer hospitals of type 1 to type 2 because the observed mortality rate is lower at type 1 hospitals. Here we calibrate these Bayesian recommendation systems by checking, out of sample, whether their predictions aggregate to give correct general advice derived from another sample. This process of calibrating individualized predictions against general empirical advice leads to substantial revisions in the Hospital Compare model for AMI mortality, revisions that hierarchically incorporate information about hospital volume, nursing staff, medical residents, and the hospital’s ability to perform cardiovascular procedures. And for the ultimate purpose of meaningful public reporting, predicted mortality rates must then be standardized to adjust for patient-mix variation across hospitals. Such standardization can be accomplished with counterfactual mortality predictions for any patient at any hospital. It is seen that indirect standardization, as currently used by Hospital Compare, fails to adequately control for differences in patient risk factors and systematically underestimates mortality rates at the low volume hospitals. As a viable alternative, we propose a full population direct standardization which yields correctly calibrated mortality rates devoid of patient-mix variation. (This is joint research with Veronika Rockova, Paul Rosenbaum, Ville Satopaa and Jeffrey Silber).
Barry Schouten
Utrecht University
Does balancing survey response reduce bias?
Due to a steady decrease in response rates over the last 20 years, survey institutions have begun to explore adaptive survey designs. These designs are similar to dynamic treatment regimes in clinical trials and adapt treatment to sample units based on auxiliary data. Such auxiliary data may be available at the onset of the survey but may also be collected during data collection. The common objective in such designs is to balance response on a relevant subset of the auxiliary data without increasing variance of estimators and subject to cost constraints. Adaptive survey designs have been criticized for balancing on information that may just as well be used in the estimation stage afterwards, i.e. without having to go through the burden of more complex data collection logistics. In the paper, I will discuss conditions under which balancing is effective, even when employing the available information in the estimation stage. The discussion views auxiliary variables as randomly selected from the pool of all possible variables on a population. The paper and discussion are applicable to missing data in general. It is shown that randomly selected auxiliary variables provide insight into the consequences of missing data, regardless of the missing-data-mechanism.
The Van Dantzig seminar is a nationwide series of lectures in statistics, that features renowned international and local speakers from the full breadth of the statistical sciences. The name honours David van Dantzig (1900-1959), who was the first modern statistician in the Netherlands, and professor in the “Theory of Collective Phenomena” (i.e. statistics) in Amsterdam. The seminar will convene 4 to 6 times a year at varying locations, and is financially supported by, among others, the STAR cluster and the Section Mathematical Statistics of the VVS-OR.
TITLES AND ABSTRACTS OF PRESENTATIONS
Jianqing Fan
Princeton University
Factor-Adjusted Robust Multiple Testing
Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavy-tailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the assumption of joint normality is often imposed, which is too stringent for many applications. To address these challenges, in this paper we propose a factor-adjusted robust procedure for large-scale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both improving the power of the tests and controlling FDP. We identify general conditions under which the proposed method produces consistent estimate of the FDP. Extensive numerical experiments demonstrate the advantage of the proposed method over several state-of-the-art methods especially when the data generated from heavy-tailed distributions. The method is convincingly illustrated by the German neuroblastoma trials.
(Based on joint work with Yuan Ke, Qiang Sun, and Wenxin Zhou)