Topics on Big Data Econometrics

May 7, 8, 10, 12, 14, 15 (TBA)
Kuan-Pin Lin
Professor of Economics
Portland State University and WISE/SOE Xiamen University

Introduction

Economic data observations come in different forms and structures. Data structures such as cross sections, time series, and panel data are familiar in economics. Based on economic theory and statistical methods, econometrics addresses issues of causal inference among economic variables. The goal of econometric analysis is for a reliable prediction and a better decision making. With the advances of information technology and rapid growth of data collection in size and scale, current state of econometric analysis faces the challenge of using massive datasets or big data.

There are two directions of new research on Big Data Econometrics. First, considering the case of ever growing size of data (N->∞), new methods of data exploration, visualization, and analysis are called to meet the demand of policy evaluation and predictive applications. Secondly, taking consideration of a broader scope of information and granular data collection, the emerging new econometric analysis involves the implementation of high-dimensional controls or covariates (that is, p>N). In the former direction of development, econometricians are open to methodologies of machine learning and data mining. Techniques such as bagging, boosting, random forests, in addition to more traditional classification and cross validation are considered. The new methods must be in consistent with the random data generating process (DGP) underlining the econometric theory. For a high dimensional econometric model, regularization such as LASSO and related methods are used. Variable selection with minimum bias is expected. With the increasing size of economic data and problem dimensions and demanding for parallel processing, it makes sense to consider a least cost option of cloud computing.

Suggested Readings

Topics

Subject to time constraint and program revision, the following introductory topics and case studies will be selected and discussed during this short course:
  1. Economic Data Analysis Using R (Part 1, Part 2)
  2. Econometric Computing in the Cloud
  3. State Space Time Series Analysis and Forecasting (Part 1, Part 2)
  4. Case Studies:
    1. The Economist's Big Mac Price Index (1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8)
    2. Wine Price in Vancouver BC, Canada (2.1, 2.2, 2.3, 2.4)
    3. Kaggle-Walmart Sales Forecasting (3.1, 3.2, 3.3, 3.4)
    4. "I Just Ran Two Million Regressions" (4.1, 4.2, 4.3, 4.4)
    5. Credit Scoring and Default Prediction (5.1, 5.2)
    6. Chinese Yuan and Stock Market (6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8)

Expectation

A data project is required for everyone taking this course. A one-page project proposal is due on or before May 15 for approval. Final report of the project is then due on or before June 15. Your grade is solely based on this data project, which is preferred to be about the Chinese economy. Your project is evaluated based on its originality, creativity and the economic contents.
Copyright© Kuan-Pin Lin
(Last updated: 04/15/2016)