UCDS

User-centric Systems for Data Science (CS 599 L1)

This project is maintained by jliagouris

« back

Special Dates

Make sure to become familiar with the Official Semester Dates. Some of the critical Semester Dates are:

Quiz Dates

Attendance

Students are expected to attend each lecture in person according to the BU safety guidlines. All course material will be posted on Piazza. Ultimately, students are responsible for their own learning and, thus, for keeping up with the material.

Tentative lecture schedule

Date Topic Note
09/06 Lec 0: Course introduction
09/08 Lec 1: Database Concepts Overview of basic concepts in Part I
09/13 Lec 2: Data Provenance Read: Provenance: What's Next?
09/15 Lec 3: Introduction to Ray Read: Ray: A Distributed Framework for Emerging AI Applications
(Ray is the system we will be using in the assignments)
09/20 Lec 4: Discussion on Assignments #1 and #2 Read: Explaining Collaborative Filtering Recommendations
09/22 Hacking Day
09/27 Lec 5: Explaining Non-Answers Read: Why not?
09/29 Lec 6: Data Causality Read: Causality in Databases
10/04 Lec 7: Dataflow Provenance Read: Explaining outputs in modern data analytics
Optional: Provenance for generalized Map and Reduce workflows
10/04 DUE DATE: Assignment #1
10/06 Lec 8: Discussion on Assignment #1
Quiz #1 (during lecture)
Common mistakes in Assignment #1
10/11 No Lecture Substitute Monday
10/13 Lec 9: Machine Learning Concepts
Overview of basic concepts in Part II
10/18 Lec 10: Generalized Additive Models Read: Intelligible Models for Classification and Regression
Watch: The Science Behind InterpretML: Explainable Boosting Machine
10/20 Lec 11: Explaining Classification Results (LIME) Read: “Why should I trust you?” Explaining the predictions of any classifier
Watch: The Science Behind InterpretML: LIME
10/21 DUE DATE: Assignment #2
10/25 Lec 12: Interpreting Model Predictions (SHAP) Watch: The Science Behind InterpretML: SHAP
Optional: A unified approach to interpreting model predictions
10/27 Lec 13: Discussion on Assignment #3
Quiz #2 (during lecture)
11/01 Lec 14: Guest Lecture by Bojan Karlas (Harvard Medical School) Data Systems for Managing and Debugging Machine Learning Development Workflows
11/03 Lec 15: Distributed Systems Concepts Overview of basic concepts in Part III
11/08 Lec 16: Causal Profiling Read: Coz: finding code that counts with causal profiling
Optional: SOSP'15 talk
11/10 Lec 17: Distributed System Tracing Read: Dapper: A large-scale distributed systems tracing infrastructure
Optional: X-Trace: A pervasive network tracing framework
11/11 DUE DATE: Assignment #3
11/15 Lec 18: Distributed System Tracing (cont.)
Discussion on Assignment #4
Read: Pivot Tracing: Dynamic causal monitoring for distributed systems
11/17 Hacking Day Office hours during lecture time
11/22 Lec 19: Blocked Time Analysis Read: Making sense of performance in data analytics frameworks
Optional: Scalability! But at what COST?
11/24 No Lecture Thanksgiving Recess
11/29 Lec 20: Critical Path Analysis Read: SnailTrail: Generalizing critical paths for online analysis of distributed dataflows
12/01 Lec 21: Guest Lecture TBA
12/06 Lec 22: Log-based Performance Analysis Read: The Mystery Machine: End-to-end performance analysis of large-scale internet services
12/08 Lec 23: Black-Box Performance Analysis
Quiz #3 (during lecture)
Read: Performance debugging for distributed systems of black boxes
12/09 DUE DATE: Assignment #4