Topics on Big Data Econometrics
Lectures Given at WISE/SOE Xiamen University, April 2019
Kuan-Pin Lin
Professor Emeritus of Economics
Portland State University
Introduction
Economic data observations come in different forms and structures. Data structures such as cross sections,
time series, and panel data are familiar in economics. Based on economic theory and statistical methods,
econometrics addresses issues of parameter estimates and causal inference.
With the advances of information technology and rapid growth of data collection in size and scale,
current state of econometric analysis faces the challenge of using massive datasets or big data.
In particular, data analytics based on machine learning are considered from the perspective of
applied econometric analysis.
There are two directions of research on Big Data Econometrics. First, considering the case of ever
growing size of data (N -> ∞), methods of data exploration, visualization, and analysis are called
to meet the demand for policy evaluation and predictive applications.
Secondly, taking consideration of a broader scope of information and granular data collection,
the modern econometric analysis involves the implementation of high-dimensional controls or covariates
(that is, p > N). In the former direction of development, econometricians are open to methodologies of machine
learning and data mining. Techniques such as bagging, boosting, random forests, neural networks, in addition to traditional
regression and classification methods are avialable. For a high dimensional econometric model, regularization
such as LASSO and related methods are used. Model selection and evaluation based on cross validation is recommended.
With the increasing size of economic data and problem dimensions and demanding for parallel processing,
it makes sense to explore a least cost option of parallel computing in the cloud.
Topics
Case Studies (subject to change)
- The Economist's Big Mac Price Index
(1.1, 1.2,
1.3, 1.4,
1.5, 1.6,
1.7)
- Surviving Titanic (Varian, 2014)
- Home Mortgage Disclosure Act (HMDA) (Varian, 2014)
- Wine Price in Vancouver BC, Canada
(2.1, 2.2,
2.3, 2.4,
2.4a, 2.4b, 2.4c)
- Kaggle-Walmart Sales Forecasting
(3.1, 3.2,
3.3, 3.4)
- "I Just Ran Two Million Regressions" (Sala-i-Martin, 1997)
(4.1, 4.2,
4.3, 4.4)
- Predicting House Prices (Mullianathan, et al., 2017)
(5.1, 5.2,
5.3, 5.4,
5.5, 5.6,
5.7, 5.8,
5.9, 5.10,
5.11)
- Credit Card Default Prediction (Yeh and Lien, 2009)
(6.1, 6.2,
6.3, 6.4,
6.5, 6.6,
6.7, 6.8,
6.9, 6.10,
6.11)
- Exchange Rate Manipulation?
(7.1, 7.2)
Suggested Readings
- Christian Kleiber and Achim Zeileis, Applied Econometrics with R,
Springer-Verlag, New York, 2008.
- Florian Heiss, Using R for Introductory Econometrics, CreatSpace, 2016.
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani,
Introduction to Statistical Learning with Applications in R, Springer 2013.
- Hadley Wickham and Garrett Grolemund,
R for Data Science,
O'Reilly Media, Inc., 2017.
- Scott Burger,
Introduction to Machine Learning with R: Rigorous Mathematical Analysis,
O'Reilly Media, Inc., 2018.
- Darren Cook, Practical Machine Learning with H2O,
O'Reilly Media, Inc., 2017.
- Hal R. Varian, Big Data: New Tricks for Econometrics,
Journal of Economic Perspectives 28:2 (3-28), Spring 2014.
- Alexandre Belloni, Victor Chernozhukov, and Christian Hansen,
High-Dimensional Methods and Inference on Structural and Treatment Effects,
Journal of Economic Perspectives 28:2 (29-50), Spring 2014.
- Sendhil Mullainathan and Jann Spiess,
Machine Learning: An Applied Econometric Approach,
Journal of Economic Perspectives 31:2 (87-106), Spring 2017.
- Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney Newey,
Double Machine Learning for Treatment and Causal Parameters,
Econometrics Journal (2018), volume 21, pp. C1-C68. doi: 10.1111/ectj.12097
Expectation
- A data project is required for everyone taking this course. A one-page project proposal is due on or before May 1 for approval.
Final report of the project, within 10 pages limit, is due on or before May 15.
- Your grade is solely based on this data project,
which is preferred to be about the Chinese economy or extenstions of the above case studies.
Your project is evaluated based on its originality, creativity and consistency with the course contents.
Copyright©
Kuan-Pin Lin
(Last updated: 4/28/2019)