Colloquium - Programming Languages Techniques for Controlling Generalization Errors in Adaptive Data Analysis

Speaker

Marco Gaboardi (Boston University)

Abstract

Data analysts aim at guaranteeing that the result of a data analysis run on sample data does not differ too much from the result one would achieve by running the analysis over the entire population. To achieve this goal, they have developed several techniques to control the generalization errors of their data analyses. In this talk, I will discuss how programming language techniques can help data analysts to design adaptive data analyses with low generalization error. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. When queries are arbitrarily composed, the different errors can propagate through the chain of queries and bring high generalization errors. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses.

In my talk, I will first present a programming model for adaptive data analyses based on a simple imperative programming language that is suitable to integrate different techniques that can be used for controlling the generalization error. I will then introduce a program analysis for this language that, given an input program implementing an adaptive data analysis, generates an upper bound on the total number of queries that the data analysis will run, and more interestingly also an upper bound on the depth of the chain of queries implemented by the input program. These two measures can be used to select the right technique to guarantee a bound on the generalization error of the input data analysis. I will then discuss limitations and potential future works.

Based on joint work with Jiawen Liu (Boston University), Weihao Qu (Boston University), Deepak Garg (MPI-SWS) and Jonathan Ullmann (Northeastern University).

Bio

Marco Gaboardi is an associate professor at Boston University. Prior to joining Boston University he was an assistant professor at the University at Buffalo, SUNY, and at the University of Dundee, Scotland. Marco received his PhD from the University of Torino, Italy, and the Institute National Polytechnique de Lorraine, France. He has been a visiting scholar at the University of Pennsylvania, at Harvard University’s CRCS center, and at the Simons’ institute at UC Berkeley. He is a recipient of the NSF CAREER award and of an EU Marie Curie Fellowship. Marco's research is in programming languages, formal verification, and in differential privacy.

(Zoom passcode: 583807)

[Click image below to view event recording]

First slide from 10/21 Colloquium - Programming Languages Techniques for Controlling Generalization Errors in Adaptive Data Analysis

Friday, October 22, 2021 4:00pm to 5:00pm

Virtual Event

Matthieu Biger

View on Event Calendar

Individuals with disabilities are encouraged to attend all University of Iowa–sponsored events. If you are a person with a disability who requires a reasonable accommodation in order to participate in this program, please contact Matthieu Biger in advance at 3193350713 or matthieu-biger@uiowa.edu.