SAS users have been repeatedly bumping on the limitations of SAS's most basic econometric procedures.
PROC GLM is great for absorbing a single fixed effect but is unable to cluster standard errors and to instrument endogenous covariates.
PROC SYSLIN is great for two-stage least square regressions but is unable to cluster the standard errors or to absorb high dimensionality fixed effects.
PROC SURVEYREG is great for clustering (one way) standard errors but is unable to absorb high dimensionality fixed effects or to instrument endogenous covariates.
For those who use SAS, which still has some advantages for handling large datasets, I've been writing the following %FELM (FIXED EFFECTS LINEAR MODELS) which combines absorbing multiway fixed effects, 3-way clustering and 2SLS instrumental variables regressions.
This macro is inspired by Simen Gaure's excellent felm function in the lfe package (https://www.rdocumentation.org/packages/lfe/versions/2.8-3/topics/felm) and Sergio Correia's excellent reghdfe (http://scorreia.com/software/reghdfe/). I've also reused previous %ClusteredTSLS macro writtent by Tanguy Brachet which clusters standard errors of an IV regression (https://www.researchgate.net/publication/303288577_Clustered_IV) and %REG2DSE Macro from Mark Ma which offers two-way clustering of an OLS simple regression (https://sites.google.com/site/markshuaima/home/two-way-clustered-standard-errors-and-sas-code)
Macro FELM computes
a) one or multiway (experimental) fixed effects regression for
- OLS models
- 2SLS IV models
Multiway regression is based on the Method of Alternating Projections to sweep out multiple group effects from the normal equations before estimating the remaining coefficients with OLS.
For the moment the multi-way fixed effects is doing a loop on proc standard and proc compare. It is not very efficient and remains very slow ! Beware ! (PROC IML version demanded !). Check the number of iterations and the threshold to avoid blocking the computer for too long.
b) Multi-way Robust clustered standard errors (up to 3 clusters) for
- Simple OLS
- One way or multiway FE OLS
- IV regression
We follow here Cameron et al. (2011) and use the following formula for estimating V[B] the variance-covariance matrix of the parameters B for two-way clustering :
V[B] =V_g[B] + V_h[B] - V_g*h[B]
and for three-way clustering :
V[B] =V_g[B] + V_h[B]+V_i[B] - V_g*h[B] - V_h*i[B] - V_g*i[B] + V_g*h*i[B]
(with g, h, and i three clustering variables)
Cf. Cameron, A.C., J.B. Gelbach and D.L. Miller (2011) "Robust inference with multiway clustering", Journal of Business & Economic Statistics 29(2):238--249.
%felm(data=, dependent=, var=, fe=, class=, format=, cluster=, endogenous=, instruments=, weight=, multi=, threshold=0.000000001, maxiter=1000, nestedness=YES, deldata=YES);
- data: name of the dataset. Strongly recommended to use a dataset already in the work library.
- dependent: name of the dependent variable
- var: list of independent variables (which should include also the ones in the class statement)
- fe: list of fixed effects to demean. For multiway fixed effects, as the procedure is very slow, rather used dummies if the number of dummies is estimatable with surveyreg (<500).
- class: class variables as in a SAS class instruction. (Only for simple multiway clustering OLS)
- format: format as in a SAS format instruction. (Only for simple multiway clustering OLS)
- cluster: list of variables for multi-way clustering (max=3)
- endogenous: list of endogenous variables in an IV model
- instruments: list of instruments in an IV model (no need to add the other control variables)
- weight: SAS weight variable (if needed)
- multi: 1 if multiple observations in intersection of clusters, 0 otherwise. If empty (default), the macro will determine its value.
- threshold: threshold of convergence for sweaping out the fixed effects demeaning. Default=0.000000001. Criteria of convergence at iteration k :
Max_j(Max_i(Dkji-Dk-1ji))-Min_j(Min_i(Dkji-Dk-1ji))<threshold With j: variables, i: observation
- maxiter: Maximum number of iterations for sweeping out the fixed effects via the Method of Alternating Projections. Default=1000.
- nestedness: If YES (default), checks whether the three first fixed effects are nested within three first clusters and corrects accordingly the VCOV matrix and uses Min_i(G_i-1) as the degree of freedom for Student's T-test (approximately as in Stata-Reghdgfe - which does also supplementary corrections). The macro does not check for eventual nestedness of other variables listed in the var= option or for the fourth and more fixed effects. (However I recommend to use little number of fixed effects). If NO, no correction for nestedness and uses N-K (fixed effects included) as the degree of freedom for Student's T-test (as in R-felm). For other values, both methods are printed.
- deldata: deletes all data produced (except final regression results), notably the demeaned data at the end of the macro. Default=YES
- Categorical variables are not (yet) accepted for fixed effects or IV estimations (except fixed effecs and clusters variables). They are only accepted for simple 2 or 3-way clustering. Dummies for categorical variables should be precomputed before running the macro.
- Interactions should be listed as follows: var1*var2. They will be turned automatically into a new variable : var1_x_var2. As a consequence, do not use "_x_" in your variable names.
- SAS list of variables such as V1-V9 or THISVAR--THATVAR accepted in VAR= and INSTRUMENTS=
- In IV regressions, endogenous variables listed in ENDOGENOUS= option should not be listed in the VAR= option.
- In the option INSTRUMENTS= list only the instruments. No need to include the control variables
- Class categorical variables with fixed effects or IV regressions ==> forthcoming.
*Two-way fixed effects and 2-way clustering;
%felm(data=a, dependent=lwage, var=profit senoirity, fe=firm individual, cluster=firm individual);
*Instrumental variable, one-way fixed effect and two-way clustering;
%felm(data=a, dependent=lwage, var=age, endogenous=education, instruments=month_birth, fe=town, cluster=town year);