Econometric analysis of accounting choice
Explore issues of (broad) interest raised in colloquium, papers, or formative ideas from a causal perspective.
1. Describe the general problem.
2. Identify a specific (causal) research question.
3. Formulate a research design.
4. Discuss interpretation of potential results.
Closely attend to the role of probability assignment and counterfactuals in devising an identification strategy and data analysis.
1. Identify a topic of interest.
2. Apply the above outline to the subject.
3. Simulate data and apply your research design.
4. If you have archival data, subject these data to your vetted design and interpret the results; otherwise, interpret the results for the simulated data.
"No successful mechanical algorithm for discovering causal or structural models has yet been produced, and it is unlikely that one will ever be found. At the same time, it is unlikely that the quest for a mechanical algorithm for determining causality from data will ever be abandoned. The tension between the use of tacit knowledge and formal algorithmic methods is likely to be a permanent feature of empirical research in economics. It arises because in most empirical studies there is always more knowledge about a problem being studied than appears in the sampling distributions of the measured variables being analyzed or in well-specified Bayesian priors. The best empirical work in economics uses economic theory as a framework for integrating all of the available evidence, tacit and algorithmic, to tell a convincing story."
– James Heckman, Quarterly Journal of Economics 115, no. 1 Feb. 2000, p. 46.
The objectives are (i) to build a foundation for meaningful discussions, (ii) to complement econometrics (and statistics) courses (in my view, coursework provides an opportunity to survey what's out there – this is an opportunity to develop a deeper understanding), and (iii) to explore implications for accounting.
Below is a list of potential topics for summer seminar. We surely will not cover them all (some will naturally get combined in the course of our discussions). It goes without saying, this is an ambitious, time and energy-taxing endeavor, but this seminar is purely optional (it's an opportunity not an obligation). If you decide to participate, I would like you to choose a topic and tentative date for discussion. I believe you'll get more out of seminar if you actively participate, therefore my plan for this summer is to give you some latitude in leading the discussion. Don't think you have to exhaust the topic – that's not possible, many of these topics warrant a course of their own. Do not expect closure.
There is some coverage of each of these issues in my monograph, Accounting and causal effects: Econometric challenges. But don't think you have to limit yourself to these pages. The monograph is posted below along with some supplemental material developed (and still under development) subsequent to completion of the monograph along with some sample R programs.
As you know, linear algebra is an indispensable tool for these explorations. Amongst the supplemental materials is an appendix (not listed with the topics below) which includes a brief survey of some important concepts from linear algebra along with some examples.
Make use of the best tools at your disposal (spreadsheets, Mathematica, Matlab, etc.). I find R, the open source statistical software, to be invaluable especially for computationally intensive work like McMC simulation. Since it's open source, it evolves rapidly and many of the contributors have fabulous programming skills that make R extraordinarily efficient. I plan to post a few sample programs to, hopefully, ease the learning curve but expect to discover your own tricks for gaining efficiency (sadly, not my forte).
Treatment effects (a special case of causal effects) receive a substantial amount of attention in the topics below as well as in the monograph. Some key features (illustrated in the Tuebingen-style and supplementary pages examples) are
–omitted, correlated variables (personally, I devote 99% of my efforts to this problem, in this case it leads to potential selection bias)
–counterfactuals & inferring population-level parameters (this distinguishes the treatment effect problem from, say, a run of the mill endogenous regressor problem and is a source of controversy among scientists; see Dawid)
–common support (how confident are you extrapolating outside the relevant range?)
–there are numerous definitions of treatment effects (average treatment effect, average treatment effect on the treated, average treatment effect on the untreated, local average treatment effect, marginal treatment effect, policy-relevant treatment effect, ambiguity of treatment effects identified by linear IV, etc.) not all of them are accessible without full support and they are not all equally well suited to the problem at hand
–choice of instruments is even more challenging than usual and greater care is required in interpreting the results
–higher explanatory power (in either the selection or outcome equations) does not necessarily lead to a well-identified model
Linear models (OLS, GLS, FWL, etc.)
Equilibrium earnings management (chapters 2 & 3)
Fixed effects & differences-in-differences and random effects & random coefficients
Linear instrumental variables (IV)
Discrete choice models
Maximum likelihood estimation (James-Stein shrinkage estimators)
Nonparametric & semiparametric regression
Bootstrapping & posterior simulation
McMC (Markov chain Monte Carlo) simulation (chapters 7 & 12 and supplemental materials Bayesian notes)
Strategic choice models
Treatment effects (chapter 8 Tuebingen-style examples)
Ignorable treatment effects (chapter 9 Tuebingen-style examples)
Identification strategies: exogenous dummy variable regression, nonparametric regression, propensity score & propensity score matching, control functions, regression discontinuity design, etc. (mean conditional independence (ignorability) vis-a-vis conditional independence (strong ignorability))
Asset revaluation regulation (limitations of outcome measures for inferring welfare effects; chapters 2 & 9)
Nonignorable (IV) treatment effects (chapter 10 Tuebingen-style examples)
Generalized Roy model interpretation
Identification strategies: endogenous dummy variable IV regression, propensity score IV, ordinate control function IV, inverse Mills control function IV, LATE and linear IV, etc.
IV control functions & projections examples (ANOVA, ANCOVA to control functions examples in supplemental materials projections notes)
Continuous treatment & (correlated) random coefficients
Regulated report precision (nonignorable versus ignorable identification strategy effectiveness; chapters 2 & 10)
Marginal treatment effects
Identification and LIV (local instrumental variables)
Regulated report precision (apparent nonnormality & marginal treatment effects; chapters 2 & 11)
Bayesian treatment effects
Identification via McMC data augmentation
Identification & McMC IV (restricted) data augmentation (supplemental materials Bayesian notes)
Regulated report precision (marginal & average treatment effects as well as policy-relevant treatment effects; chapters 2 & 12)
Informed priors & Bayesian data analysis (probability as logic)
Maximum entropy probability assignment & Jaynes' widget problem
Conjugate families (supplemental materials Bayesian notes)
Inverting financial statements (to recover transactions)
Smooth accruals (valuation and performance evaluation implications)
Earnings management (stochastic & selective manipulation)
Quantum production, synergy, and (product market) strategic disclosure
Other topics of interest to you
Table of contents for Accounting and causal effects: Econometric challenges
- including OLS, GLS, FWL, and IV estimators
- including MLE, James-Stein shrinkage estimators, and nonlinear regression
- employed as propensity score in treatment effect analysis
- employed with Heckman's MTE
- including McMC Bayesian analysis
- including widget example
- including inferring transactions from financial statements
- including convergence in probability, convergence in distribution, and rates of convergence
Errata (corrections for the 2010 Springer printed version)
Supplemental materials: (most of this is work in progress – expect updates)
1. Probability as logic (http://bayes.wustl.edu/etj/science.pdf.html)
2. Accounting, managerial experimentation and causal effects
stewardship and the search for a better “mouse trap”
ANOVA, ANCOVA, linear regression
double residual regression (FWL)
limited common support
regression discontinuity designs
dynamic treatment effects
LATE & 2SLS (linear) IV
continuous treatment effects
Bayes’ theorem & consistent reasoning
maximum entropy probability assignment
posterior simulation & conjugate families
independent & conditional posterior simulation
Markov chain Monte Carlo (McMC) simulation
irreducibility, time reversibility, & stationarity
Metropolis-Hastings (MH) algorithm
data augmented sampler for Probit
random walk MH logit
uniform data augmented Gibbs sampler for logit
ANOVA, ANCOVA, regression
data augmented Gibbs sampler for treatment effects
data augmented IV restricted Gibbs sampler for treatment effects
Chib’s ATE and LATE
extensions (via bounding) to ATT and ATUT
expected values (point and partial identification)
respecting stochastic dominance
missing covariates and outcomes
missing at random
missing by choice
stochastic dominance and treatment effects
various instrumental variable strategies (missing at random, statistical independence, means missing at random, mean independence)
monotone instrumental variable strategies (mean monotonicity, exogenous treatment selection, monotone treatment selection, monotone treatment response, MM-MTR, MTR-MTS)
quantile treatment effects (point and partial identification)
Extrapolation and the mixing problem
Appendix: bounds on spreads
linear algebra basics
fundamental theorem of linear algebra
exactly, under-, and over-identified
singular value decomposition (SVD) & pseudo inverse
some determinant identities (& LU factorization)
multivariate normal theory
generalized least squares (GLS)
two stage least squares IV (2SLS-IV)
seemingly unrelated regression (SUR)
maximum likelihood estimation of discrete choice models
R Statistical Computing Package
R is a freely available, open-source version of S/S-plus.
Follow this link: R statistical computing package
R samples programs
Tutorials: R tutorial
find your favorite on the web
source files for data augmented Gibbs sampler bivariate probit:
you will need to add bayesm library from Cran website if it’s not including in your base program
source files for Metropolis-Hastings bivariate logit (self-contained code and code utilizing MCMCpack library which you may need to add)
source files for data augmented Gibbs sampler bivariate selection analysis of treatment effects:
you will need to add MASS library from Cran website if it’s not including in your base program
Another Bayesian McMC strategy for identifying treatment effects (perhaps, with panel data) can be found in Chib’s papers posted below:
Shin, 1993, "News management and the value of firms," RAND Journal of Economics, 25(1), 58-71.
Shin, 1993, "Disclosures and asset returns," Econometrica, 71(1), 105-133.
Shin, 2006, “Disclosure risk and price drift,” Journal of Accounting Research, 44(2), 351-379.
Arya and Mittendorf, “The interaction between corporate tax structure and disclosure policy,” Annals of Finance, forthcoming.
Arya and Mittendorf, “Input markets and the strategic organization of the firm,” Foundations and Trends in Accounting, 5(1), 1-97.