Bridging econometrics and data science — theory, Python, and real-world applications.
This course bridges the gap between traditional econometrics and modern data science, offering both a theoretical understanding of machine learning and the practical skills to apply it. We examine how techniques like supervised and unsupervised learning, natural language processing, and the emerging field of Causal ML allow economists to tackle large, complex datasets. We also explore the transformative role of AI and Large Language Models in social science research.
Students should have completed a course in econometrics or statistics covering multivariate regression and hypothesis testing. No prior programming experience is required, but willingness to invest time and effort in learning the basics of Python in and outside of class is expected.
From econometrics to ML. Python setup. The prediction vs. inference distinction.
Slides (coming soon)Linear regression as ML. Ridge, Lasso, and Elastic Net. Regularisation and variable selection.
Slides (coming soon)Classification: logistic regression, decision trees, random forests, gradient boosting.
Slides (coming soon)Bias–variance trade-off. Cross-validation. Hyperparameter tuning.
Slides (coming soon)Clustering (k-means, hierarchical). Dimensionality reduction (PCA).
Slides (coming soon)Bag-of-words, TF-IDF, embeddings. Sentiment analysis. Topic modelling.
Slides (coming soon)Large Language Models. Prompt engineering. AI-assisted research. Course wrap-up.
Slides (coming soon)Final Assignment (70%): Apply ML methods from the course to an economic dataset. Submit a short analytical report with reproducible Python code.
Presentation (30%): A short presentation on a course-related topic or application, scheduled during the final weeks of term.