An advanced-level course focusing on modern statistical and machine learning techniques for economic research. Students will apply tools like ARIMA, logistic regression, PCA, random forests, and SHAP to real-world data, developing the skills to analyze time-dependent and high-dimensional datasets using Python.
This course builds on foundational knowledge of statistics, including key concepts such as probability, distributions, sampling, hypothesis testing, analysis of variance, and regression analysis. It is designed for students who have already gained experience in applying these basic techniques to economic problems. In this second-level course, learners advance to contemporary analytical methods used in modern economic research and data science, focusing on more complex, high-dimensional, and time-dependent data.
The course is designed to equip students with hands-on experience in Python while deepening their understanding of statistical reasoning and machine learning. Students will work with real datasets to build time series models (e.g., ARIMA), perform logistic regression and generalized linear modeling with regularization (lasso, ridge), apply dimensionality reduction techniques like PCA and factor analysis, and experiment with clustering and supervised learning models such as random forests and gradient boosting.
Rather than separating theory from practice, all sessions follow an integrated format where students immediately apply concepts through coding and group analysis. The aim is to empower students to perform independent, robust, and communicable economic analyses that meet both academic and applied research standards.
At the end of the course, the learner will be able to:
• Apply and interpret time series models for forecasting economic data
• Build and evaluate classification models using logistic regression and regularization
• Conduct dimensionality reduction using PCA and factor analysis
• Use clustering and unsupervised learning techniques for pattern discovery
• Implement and compare supervised machine learning algorithms (e.g., random forest, SVM)
• Interpret model results and communicate findings effectively
Students should have completed an introductory statistics course, or possess equivalent prior knowledge. Required competencies include understanding basic statistical principles (e.g., mean, variance), probability theory, distributions, sampling and estimation, hypothesis testing, analysis of variance (ANOVA), and linear regression. Students should also be comfortable interpreting statistical results and reasoning with real-world economic data. No prior programming experience is assumed, but a willingness to learn and use Python is essential.
Compulsory
• James et al. (2023). An Introduction to Statistical Learning with Applications in Python
• Kenett et al. (2022). Modern Statistics: A Computer-Based Approach with Python
• McClave et al. (2017). Statistics for Business and Economics
Recommended
• Müller & Guido (2016). Introduction to Machine Learning with Python
• Hyndman & Athanasopoulos (2021). Forecasting: Principles and Practice
• Xiao (2022). Artificial Intelligence Programming with Python
• Lectures
• Group assignments
• Term paper project
• Python-based exercises
• Weekly consultations
Transcript of records
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.