User-centric Systems for Data Science (CS 599 L1)
This project is maintained by jliagouris
Welcome to CS 599 L1: User-centric Systems for Data Science, Fall 2022
Understanding the behavior of data processing pipelines is hard. Questions like “Why does the system return certain results?” and “Why is the execution slow?” arise too often in data analytics. Answering such questions is still a cumbersome task that requires considerable amount of resources as well as manual work by experts.
The course focuses on algorithmic techniques and system design principles that help humans get meaningful insights into complex data processing pipelines. In the first part of the course, we will discuss methods for explaining computation outputs, including approaches from databases and recommendation systems. In the second part, we will discuss state-of-the-art approaches to interpretable machine learning, such as generalized additive models, LIME, and SHAP. In the third part, we will focus on techniques that help users understand execution performance. We will discuss traditional and causal profiling, end-to-end tracing, and critical path analysis.
At the end of the semester, successful students will have a solid understanding of:
The course will be self-contained. Each one of the three parts will have an introductory lecture on the necessary concepts to understand the related research papers. Students must have strong programming skills (C/Python) and basic knowledge of data structures, algorithms, and computer systems (CS 112, CS 210 or equivalent experience).