Applied ML & AI for Economics

00

Pre-Course Preparation

Show Readings & Questions

Mandatory Pre-Readings

James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021). An Introduction to Statistical Learning. (Download ISL PDF). Read Chapter 1 and Chapter 2 (up to section 2.1.3).
Cunningham, S. (2021). Causal Inference: The Mixtape. (Read Chapter 2 Online). Focus on 2.1–2.4, 2.7–2.17, 2.24–2.25.
ISL, Chapter 3 (Sections 3.1–3.4 inclusive).

Questions

Prediction vs. Inference: Explain the fundamental difference between Prediction (forecasting a future outcome) and Inference (understanding the causal effect). Provide one economic example where pure prediction is sufficient, and one where causal inference is required.
Parametric vs. Non-parametric: Define "parametric" and "non-parametric" models within a statistical context. Why might an economist actively choose a rigid parametric model over a flexible non-parametric one?
Multiple Variable Regression: In a multivariate regression model, we interpret a coefficient as the effect of X "holding all other variables constant." At a high level, why does this interpretation become practically difficult when your dataset contains hundreds of overlapping variables?
ML in real life: Consider a real-world scenario where a firm uses a Machine Learning algorithm trained on historical data to automatically screen loan applications or job candidates. What are the economic or ethical risks of blindly deploying this model without understanding its internal logic?
OLS Geometry: Using (ISL chap 3.4), argue that in the case of simple linear regression, the least squares line always passes through the point (x̄, ȳ).

01

Regression & Regularisation

Download Slides (PDF)

02

Classification & Validation

Download Slides (PDF)

Show Preparation Questions

Preparation Questions

The "Naive" Assumption in NLP: Consider using a Multinomial Naive Bayes classifier for Sentiment Analysis (predicting if a tweet is positive or negative). What assumption does the "Naive" in Naive Bayes refer to? Provide a real-world example of a short phrase or sentence where this "naive" assumption dramatically fails, leading the classifier to make the wrong prediction.

Naive Bayes Calculation: Assume the following likelihoods for each word being part of a positive or negative movie review, and equal prior probabilities for each class.

Word	P(word \| pos)	P(word \| neg)
I	0.09	0.16
always	0.07	0.06
like	0.29	0.06
foreign	0.04	0.15
films	0.08	0.11

What class will Naive Bayes assign to the sentence: "I always like foreign films."?

The Cost of Being Wrong in Medical AI: Consider an AI system deployed in a hospital to screen patients for a rare but aggressive form of cancer. The model makes two types of mistakes: False Positives (an unnecessary, stressful biopsy) and False Negatives (sending a sick patient home without treatment). Are these errors equally costly? Why does a single overarching "Accuracy" metric fail to capture the real-world utility and ethical implications of this AI system? If you were the developer, how would you mathematically "tune" the classifier to prioritize saving lives, even if it means more false alarms?
LLMs and Data Leakage: Consider a tech company training a new Large Language Model (LLM) to act as a coding assistant. They evaluate the model on a test set of challenging 2024 coding problems and achieve a 95% success rate. However, the model was pre-trained on the entire internet, inadvertently including the solutions to those very problems. If this model is deployed to users writing completely new code, what will happen? Why is it mathematically dangerous to evaluate an AI on data it has already seen?
The Bootstrap Intuition: You have trained a complex statistical model to predict regional housing prices, but your manager asks: "What is the margin of error for these predictions?" Unlike simple linear regression, complex non-linear models don't have a neat mathematical formula for standard errors. Without collecting more data, how could you use the data you already have, along with computing power, to simulate "new" datasets and estimate this uncertainty?

03

Trees & Ensembles

04

Unsupervised Learning

05

Causal ML

06

Text as Data (NLP)

07

Deep Learning & AI Foundations

08

Applied Machine Learning & AI for Economics

About the Course

Prerequisites

Weekly Schedule