TUESDAY 28TH
8:30am - 9:00am Welcome, introduction and opening remarks |
RELIABILITY THEME9h00 - 9h10 :
9h00 - 10h10 :
10h10 - 10h35 :
10h35 - 11h00 :
|
11h00 -11h30 : Networking break (with student poster) |
11:30 a.m. - 12:30 p.m. : Scientific roundtable on reliability (research and industry)
|
12:30 pm - 2:00 pm Lunch break |
CONFIDENTIALITY THEME2:00 - 2:10 pm :
2:00 pm - 3:10 pm :
15h10 - 15h35 :
15h35 - 16h00 :
|
4:00 - 4:30 pm : Networking break |
4:30 pm - 5:30 pm : Theme 2 - Scientific round table on confidentiality (research, industry)
|
WEDNESDAY 29
8:30am - 9:00am Welcome, introduction and opening remarks |
ROBUSTNESS THEME9h00 - 9h10 :
9h00 - 10h10 :
Title: An automatic check of finite sample robustness: can the elimination of a little data change the conclusions? Abstract: Practitioners often analyze a sample of data with the aim of applying the findings to a new population. For example, if economists conclude that microcredit is effective in reducing poverty on the basis of observed data, policymakers may decide to distribute microcredit in other locations or in future years. Generally, the original data do not constitute a perfect random sample of the population where the policy is applied - but researchers may feel comfortable generalizing anyway, as long as deviations from random sampling are small, and the corresponding impact on conclusions is also small. Conversely, researchers might worry if a very small proportion of the data sample was responsible for the original conclusion. We therefore propose a method for assessing the sensitivity of statistical conclusions to the deletion of a very small part of the data set. As it is not computationally feasible to manually check all small subsets of the data, we propose an approximation based on the classical influence function. Our method is automatically computable for current estimators. We provide finite-sample error bounds on the performance of the approximation and an exact, inexpensive lower bound on the sensitivity. We find that sensitivity is determined by a signal-to-noise ratio in the inference problem, does not vanish asymptotically and is not influenced by misspecification. Empirically, we find that many data analyses are robust, but that the conclusions of several influential economic papers can be altered by deleting (much) less than 1% of the data. 10h10 - 10h40 :
10h40 - 11h00 :
|
11h00 -11h30 : Networking break (with student poster) |
11:30 a.m. - 12:30 p.m. : Scientific roundtable on robustness (research and industry)
|
12:30 pm - 2:00 pm Lunch break |
EXPLAINABLE THEMES2:00 - 2:10 pm :
14h10 - 15h10 :
Title: Friends don't let friends deploy black-box models: The importance of intelligibility in machine learning Abstract: Every dataset is imperfect, often in surprising ways that are difficult to anticipate. Unfortunately, most machine learning methods are black boxes and provide little information about what they have learned. We have developed a glass-box learning method called EBM (Explainable Boosting Machines) that is as accurate as black-box methods such as gradient boosted trees, random forests and neural networks, while being even more intelligible than linear models such as logistic regression. In my talk, I will present an introduction to glass-box learning and EBMs, as well as a number of case studies where glass-box models discover surprising flaws in the data that need to be corrected before deployment, but which would not have been discovered with glass-box learning methods. Every dataset is imperfect - you need glass-box machine learning to detect and correct defects. 15h10 - 15h40 :
Title: Why Random Forests Work and Why That's a Problem 15h40 - 16h00 :
|
4:00 - 4:30 pm : Networking break |
4:30 pm - 5:30 pm : Theme 2 - Scientific round table on explainability (research, industry)
4:30 - 5:30 pm: Closing event |