Course Syllabus: BDE

Topics on Big Data Econometrics

Lectures Given at WISE/SOE Xiamen University, April 2019

Kuan-Pin Lin
Professor Emeritus of Economics
Portland State University

Introduction

Economic data observations come in different forms and structures. Data structures such as cross sections, time series, and panel data are familiar in economics. Based on economic theory and statistical methods, econometrics addresses issues of parameter estimates and causal inference. With the advances of information technology and rapid growth of data collection in size and scale, current state of econometric analysis faces the challenge of using massive datasets or big data. In particular, data analytics based on machine learning are considered from the perspective of applied econometric analysis.

There are two directions of research on Big Data Econometrics. First, considering the case of ever growing size of data (N -> ∞), methods of data exploration, visualization, and analysis are called to meet the demand for policy evaluation and predictive applications. Secondly, taking consideration of a broader scope of information and granular data collection, the modern econometric analysis involves the implementation of high-dimensional controls or covariates (that is, p > N). In the former direction of development, econometricians are open to methodologies of machine learning and data mining. Techniques such as bagging, boosting, random forests, neural networks, in addition to traditional regression and classification methods are avialable. For a high dimensional econometric model, regularization such as LASSO and related methods are used. Model selection and evaluation based on cross validation is recommended. With the increasing size of economic data and problem dimensions and demanding for parallel processing, it makes sense to explore a least cost option of parallel computing in the cloud.

Topics

Case Studies (subject to change)

The Economist's Big Mac Price Index (1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7)
Surviving Titanic (Varian, 2014)
Home Mortgage Disclosure Act (HMDA) (Varian, 2014)
Wine Price in Vancouver BC, Canada (2.1, 2.2, 2.3, 2.4, 2.4a, 2.4b, 2.4c)
Kaggle-Walmart Sales Forecasting (3.1, 3.2, 3.3, 3.4)
"I Just Ran Two Million Regressions" (Sala-i-Martin, 1997) (4.1, 4.2, 4.3, 4.4)
Predicting House Prices (Mullianathan, et al., 2017) (5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.10, 5.11)
Credit Card Default Prediction (Yeh and Lien, 2009) (6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 6.10, 6.11)
Exchange Rate Manipulation? (7.1, 7.2)

Expectation

A data project is required for everyone taking this course. A one-page project proposal is due on or before May 1 for approval. Final report of the project, within 10 pages limit, is due on or before May 15.
Your grade is solely based on this data project, which is preferred to be about the Chinese economy or extenstions of the above case studies. Your project is evaluated based on its originality, creativity and consistency with the course contents.

(Last updated: 4/28/2019)