HT 2026 ยท Department of Economics

Applied Machine Learning & AI for Economics

Elodie Chervin
ยท
Daniel Barbosa

About the Course

This course bridges the gap between traditional econometrics and modern data science, offering both a theoretical understanding of machine learning and the practical skills to apply it. We examine how techniques like supervised and unsupervised learning, natural language processing, and the emerging field of Causal ML allow economists to tackle large, complex datasets. We also explore the transformative role of AI and Large Language Models in social science research.

๐ŸŽ“
Level
3rd-year Economics undergraduates
๐Ÿ
Language
Python (no prior experience required)
๐Ÿ“
Assessment
Weekly Questions (40%) + Presentations (30%) + Report (30%)

Prerequisites

Students should have completed a course in econometrics or statistics covering multivariate regression and hypothesis testing. No prior programming experience is required, but willingness to invest time and effort in learning the basics of Python in and outside of class is expected.

Weekly Schedule

00

Pre-Course Preparation

Show Readings & Questions

Mandatory Pre-Readings

  • James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021). An Introduction to Statistical Learning. (Download ISL PDF). Read Chapter 1 and Chapter 2 (up to section 2.1.3).
  • Cunningham, S. (2021). Causal Inference: The Mixtape. (Read Chapter 2 Online). Focus on 2.1โ€“2.4, 2.7โ€“2.17, 2.24โ€“2.25.
  • ISL, Chapter 3 (Sections 3.1โ€“3.4 inclusive).

Questions

  1. Prediction vs. Inference: Explain the fundamental difference between Prediction (forecasting a future outcome) and Inference (understanding the causal effect). Provide one economic example where pure prediction is sufficient, and one where causal inference is required.
  2. Parametric vs. Non-parametric: Define "parametric" and "non-parametric" models within a statistical context. Why might an economist actively choose a rigid parametric model over a flexible non-parametric one?
  3. Multiple Variable Regression: In a multivariate regression model, we interpret a coefficient as the effect of X "holding all other variables constant." At a high level, why does this interpretation become practically difficult when your dataset contains hundreds of overlapping variables?
  4. ML in real life: Consider a real-world scenario where a firm uses a Machine Learning algorithm trained on historical data to automatically screen loan applications or job candidates. What are the economic or ethical risks of blindly deploying this model without understanding its internal logic?
  5. OLS Geometry: Using (ISL chap 3.4), argue that in the case of simple linear regression, the least squares line always passes through the point (xฬ„, yฬ„).
01

Regression & Regularisation

02

Classification & Validation

03

Trees & Ensembles

04

Unsupervised Learning

05

Causal ML

06

Text as Data (NLP)

07

Deep Learning & AI Foundations

08

Large Language Models in Economics

Assessment

Weekly Questions (40%): Conceptual and practical questions assigned each week to consolidate learning.

Presentations (30%): Students will present a mini data project application of the methods (2 per week during Weeks 5, 6, 7, and 8).

Report (30%): A written report due alongside the presentation, containing all reproducible Python code used for the application.