help xtset
help xttab
help xtregProblem Set 5 (SOLUTIONS)
This problem set will revise some of the material covered in Handout 5 on panel data models. This will require you to familiarize yourself with Stata’s panel-data commands.
You will be using a dataset that comes with Stata: psidextract.dta. The data is a correct version of the PSID sample in Cornwell and Rupert (1988), found in Baltagi and Khanti-Akom (1990). It includes a sample of 595 individuals observed for the years 1976-82.
Preamble
Create a do-file for this problem set and include a preamble that sets the directory and opens the data. For example,
clear
//or, to remove all stored values (including macros, matrices, scalars, etc.)
*clear all
* Replace $rootdir with the relevant path to on your local harddrive.
cd "$rootdir/problem-sets/ps-5"
cap log close
log using problem-set-5-log.txt, replace
use problem-set-5-data.dta, clearC:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\website\warwick
> -ec910\problem-sets\ps-5
-------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\we
> bsite\warwick-ec910\problem-sets\ps-5\problem-set-5-log.txt
log type: smcl
opened on: 11 Nov 2024, 12:44:56
(PSID wage data 1976-82 from Baltagi and Khanti-Akom (1990))
Questions
1. Set the unit identifier and time variable using xtset. Note, you can also use tsset for this task. This will allow you to use xt package commands.
xtset id t
Panel variable: id (strongly balanced)
Time variable: t, 1 to 7
Delta: 1 unit
2. Describe and summarise the variables in the dataset using the normal describe and summarize commands.
des
sum id t lwage ed exper weeks south
Contains data from problem-set-5-data.dta
Observations: 4,165 PSID wage data 1976-82 from
Baltagi and Khanti-Akom (1990)
Variables: 14 11 Nov 2024 11:19
(_dta has notes)
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
exper float %9.0g years of full-time work
experience
weeks float %9.0g weeks worked
occup float %9.0g occupation; occ==1 if in a
blue-collar occupation
industry float %9.0g industry; ind==1 if working in a
manufacturing industry
south float %9.0g residence; south==1 if in the
South area
smsa float %9.0g smsa==1 if in the Standard
metropolitan statistical area
ms float %9.0g marital status
female float %9.0g female or male
union float %9.0g if wage set be a union contract
educ float %9.0g years of education
black float %9.0g black
lwage float %9.0g log wage
id float %9.0g
t float %9.0g
-------------------------------------------------------------------------------
Sorted by: id t
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
id | 4,165 298 171.7821 1 595
t | 4,165 4 2.00024 1 7
lwage | 4,165 6.676346 .4615122 4.60517 8.537
educ | 4,165 12.84538 2.787995 4 17
exper | 4,165 19.85378 10.96637 1 51
-------------+---------------------------------------------------------
weeks | 4,165 46.81152 5.129098 5 52
south | 4,165 .2902761 .4539442 0 1
3. Describe and summarise the variables in the dataset using the panel commands: xtdescribe and xtsummarize. Comment on the information provided.
xtdescribe
xtsum id t lwage ed exper weeks south
id: 1, 2, ..., 595 n = 595
t: 1, 2, ..., 7 T = 7
Delta(t) = 1 unit
Span(t) = 7 periods
(id*t uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
7 7 7 7 7 7 7
Freq. Percent Cum. | Pattern
---------------------------+---------
595 100.00 100.00 | 1111111
---------------------------+---------
595 100.00 | XXXXXXX
Variable | Mean Std. dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
id overall | 298 171.7821 1 595 | N = 4165
between | 171.906 1 595 | n = 595
within | 0 298 298 | T = 7
| |
t overall | 4 2.00024 1 7 | N = 4165
between | 0 4 4 | n = 595
within | 2.00024 1 7 | T = 7
| |
lwage overall | 6.676346 .4615122 4.60517 8.537 | N = 4165
between | .3942387 5.3364 7.813596 | n = 595
within | .2404023 4.781808 8.621092 | T = 7
| |
educ overall | 12.84538 2.787995 4 17 | N = 4165
between | 2.790006 4 17 | n = 595
within | 0 12.84538 12.84538 | T = 7
| |
exper overall | 19.85378 10.96637 1 51 | N = 4165
between | 10.79018 4 48 | n = 595
within | 2.00024 16.85378 22.85378 | T = 7
| |
weeks overall | 46.81152 5.129098 5 52 | N = 4165
between | 3.284016 31.57143 51.57143 | n = 595
within | 3.941881 12.2401 63.66867 | T = 7
| |
south overall | .2902761 .4539442 0 1 | N = 4165
between | .4489462 0 1 | n = 595
within | .0693042 -.5668667 1.147419 | T = 7
4. Use the command xttab and xtrans, freq to describe transitions over time in the variable south.
xttab south
xttrans south, freq
Overall Between Within
south | Freq. Percent Freq. Percent Percent
----------+-----------------------------------------------------
0 | 2956 70.97 428 71.93 98.66
1 | 1209 29.03 182 30.59 94.90
----------+-----------------------------------------------------
Total | 4165 100.00 610 102.52 97.54
(n = 595)
residence; |
south==1 | residence; south==1
if in the | if in the South area
South area | 0 1 | Total
-----------+----------------------+----------
0 | 2,527 8 | 2,535
| 99.68 0.32 | 100.00
-----------+----------------------+----------
1 | 8 1,027 | 1,035
| 0.77 99.23 | 100.00
-----------+----------------------+----------
Total | 2,535 1,035 | 3,570
| 71.01 28.99 | 100.00
5. Create the variable: expsq=exper^2/1000. Why would you scale the variable in this way?
gen expsq=exp*exp/10006. Estimate the following model using pooled OLS, between-group, feasible GLS, within-group, LSDV, and first-difference. For the first-difference estimator, you can define a first-difference in Stata using the time-series operator: D.variable.
\[
\ln(Wage_{it}) = \beta_1 + \beta_2 Exper_{it} + \beta_3 Exper^2_{it} + \beta_4 Weeks_{it} + \beta_5 Eduyrs_{it} + \varepsilon_{it}
\] With each model, store the results using estimates store. For example,
* clear existing stored estimates
est clear
* Pooled OLS
regress lwage exper expsq weeks ed
est store OlS
* alternatively,
eststo OLS: regress lwage exper expsq weeks edest clear
* Pooled-OLS
eststo OLS: regress lwage exper expsq weeks ed
* Between-group
eststo BG: xtreg lwage exper expsq weeks ed, be
* Feasible-GLS
eststo FGLS: xtreg lwage exper expsq weeks ed, re theta
* Within-group
eststo WG: xtreg lwage exper expsq weeks ed, fe
* LSDV
eststo LSDV: areg lwage exper expsq weeks ed, absorb(id)
* First-differnce
eststo FD: reg D.(lwage exper expsq weeks), noconst
Source | SS df MS Number of obs = 4,165
-------------+---------------------------------- F(4, 4160) = 411.62
Model | 251.491445 4 62.8728612 Prob > F = 0.0000
Residual | 635.413457 4,160 .152743619 R-squared = 0.2836
-------------+---------------------------------- Adj R-squared = 0.2829
Total | 886.904902 4,164 .212993492 Root MSE = .39082
------------------------------------------------------------------------------
lwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .044675 .0023929 18.67 0.000 .0399838 .0493663
expsq | -.715631 .0527938 -13.56 0.000 -.8191351 -.6121268
weeks | .005827 .0011827 4.93 0.000 .0035084 .0081456
educ | .0760407 .0022266 34.15 0.000 .0716754 .080406
_cons | 4.907961 .0673297 72.89 0.000 4.775959 5.039963
------------------------------------------------------------------------------
Between regression (regression on group means) Number of obs = 4,165
Group variable: id Number of groups = 595
R-squared: Obs per group:
Within = 0.1357 min = 7
Between = 0.3264 avg = 7.0
Overall = 0.2723 max = 7
F(4,590) = 71.48
sd(u_i + avg(e_i.)) = .324656 Prob > F = 0.0000
------------------------------------------------------------------------------
lwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .038153 .0056967 6.70 0.000 .0269647 .0493412
expsq | -.631272 .1256812 -5.02 0.000 -.8781089 -.384435
weeks | .0130903 .0040659 3.22 0.001 .0051048 .0210757
educ | .0737838 .0048985 15.06 0.000 .0641632 .0834044
_cons | 4.683039 .2100989 22.29 0.000 4.270407 5.095672
------------------------------------------------------------------------------
Random-effects GLS regression Number of obs = 4,165
Group variable: id Number of groups = 595
R-squared: Obs per group:
Within = 0.6340 min = 7
Between = 0.1716 avg = 7.0
Overall = 0.1830 max = 7
Wald chi2(4) = 3012.45
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = .82280511
------------------------------------------------------------------------------
lwage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .0888609 .0028178 31.54 0.000 .0833382 .0943837
expsq | -.772565 .0622619 -12.41 0.000 -.894596 -.6505339
weeks | .0009658 .0007433 1.30 0.194 -.000491 .0024226
educ | .1117099 .0060572 18.44 0.000 .0998381 .1235818
_cons | 3.829366 .0936336 40.90 0.000 3.645848 4.012885
-------------+----------------------------------------------------------------
sigma_u | .31951859
sigma_e | .15220316
rho | .81505521 (fraction of variance due to u_i)
------------------------------------------------------------------------------
note: educ omitted because of collinearity.
Fixed-effects (within) regression Number of obs = 4,165
Group variable: id Number of groups = 595
R-squared: Obs per group:
Within = 0.6566 min = 7
Between = 0.0276 avg = 7.0
Overall = 0.0476 max = 7
F(3, 3567) = 2273.74
corr(u_i, Xb) = -0.9107 Prob > F = 0.0000
------------------------------------------------------------------------------
lwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .1137879 .0024689 46.09 0.000 .1089473 .1186284
expsq | -.4243693 .0546316 -7.77 0.000 -.5314816 -.317257
weeks | .0008359 .0005997 1.39 0.163 -.0003399 .0020116
educ | 0 (omitted)
_cons | 4.596396 .0389061 118.14 0.000 4.520116 4.672677
-------------+----------------------------------------------------------------
sigma_u | 1.0362039
sigma_e | .15220316
rho | .97888036 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(594, 3567) = 53.12 Prob > F = 0.0000
note: educ omitted because of collinearity.
Linear regression, absorbing indicators Number of obs = 4,165
Absorbed variable: id No. of categories = 595
F(3, 3567) = 2273.74
Prob > F = 0.0000
R-squared = 0.9068
Adj R-squared = 0.8912
Root MSE = 0.1522
------------------------------------------------------------------------------
lwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .1137879 .0024689 46.09 0.000 .1089473 .1186284
expsq | -.4243693 .0546316 -7.77 0.000 -.5314816 -.317257
weeks | .0008359 .0005997 1.39 0.163 -.0003399 .0020116
educ | 0 (omitted)
_cons | 4.596396 .0389061 118.14 0.000 4.520116 4.672677
------------------------------------------------------------------------------
F test of absorbed indicators: F(594, 3567) = 53.118 Prob > F = 0.000
Source | SS df MS Number of obs = 3,570
-------------+---------------------------------- F(3, 3567) = 337.12
Model | 33.3371458 3 11.1123819 Prob > F = 0.0000
Residual | 117.57812 3,567 .032962747 R-squared = 0.2209
-------------+---------------------------------- Adj R-squared = 0.2202
Total | 150.915266 3,570 .042273184 Root MSE = .18156
------------------------------------------------------------------------------
D.lwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exper |
D1. | .1170654 .0063106 18.55 0.000 .1046927 .1294381
|
expsq |
D1. | -.5321208 .1392741 -3.82 0.000 -.8051857 -.259056
|
weeks |
D1. | -.0002683 .0005648 -0.47 0.635 -.0013757 .0008392
------------------------------------------------------------------------------
7. Using the formula from Handout 5, replicate the value of \(\theta\) reported above by the FGLS estimator. Note, you will need to use the stored values of \(\sigma^2_{\varepsilon}\) and \(\sigma^2_{\alpha}\).
qui xtreg lwage exper expsq weeks ed, re theta
display "theta = " 1 - sqrt(e(sigma_e)^2 / (7*e(sigma_u)^2+e(sigma_e)^2))theta = .82280511
8. Make a table of the computed estimates. You can either use estimates table or esttab. The latter is part of the estout package, which you may need to install: ssc install estout.
esttab OLS BG FGLS, se scalar(N r2 r2_o r2_b r2_w sigma_u sigma_e rho) mtitle("OLS" "BG" "FGLS")
esttab WG LSDV FD, se scalar(N r2 r2_o r2_b r2_w sigma_u sigma_e rho) rename(D.exper exper D.expsq expsq D.weeks weeks) mtitle("WG" "LSDV" "FD")
------------------------------------------------------------
(1) (2) (3)
OLS BG FGLS
------------------------------------------------------------
exper 0.0447*** 0.0382*** 0.0889***
(0.00239) (0.00570) (0.00282)
expsq -0.716*** -0.631*** -0.773***
(0.0528) (0.126) (0.0623)
weeks 0.00583*** 0.0131** 0.000966
(0.00118) (0.00407) (0.000743)
educ 0.0760*** 0.0738*** 0.112***
(0.00223) (0.00490) (0.00606)
_cons 4.908*** 4.683*** 3.829***
(0.0673) (0.210) (0.0936)
------------------------------------------------------------
N 4165 4165 4165
r2 0.284 0.326
r2_o 0.272 0.183
r2_b 0.326 0.172
r2_w 0.136 0.634
sigma_u 0.320
sigma_e 0.152
rho 0.815
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
------------------------------------------------------------
(1) (2) (3)
WG LSDV FD
------------------------------------------------------------
exper 0.114*** 0.114*** 0.117***
(0.00247) (0.00247) (0.00631)
expsq -0.424*** -0.424*** -0.532***
(0.0546) (0.0546) (0.139)
weeks 0.000836 0.000836 -0.000268
(0.000600) (0.000600) (0.000565)
educ 0 0
(.) (.)
_cons 4.596*** 4.596***
(0.0389) (0.0389)
------------------------------------------------------------
N 4165 4165 3570
r2 0.657 0.907 0.221
r2_o 0.0476
r2_b 0.0276
r2_w 0.657
sigma_u 1.036
sigma_e 0.152
rho 0.979
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
9. Perform a Hausman test comparing the results of the FLGS and WG estimators. You should use the hausman command, with the option sigmamore. Be sure to get the order of the estimates correct. What do you learn from the test?
hausman WG FGLS, sigmamore
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| WG FGLS Difference Std. err.
-------------+----------------------------------------------------------------
exper | .1137879 .0888609 .0249269 .0012778
expsq | -.4243693 -.772565 .3481957 .0284727
weeks | .0008359 .0009658 -.0001299 .0001108
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 1513.02
Prob > chi2 = 0.0000
10. Estimate FGLS for the model below:
\[ \begin{aligned} \ln(Wage_{it}) =& \beta_1 + \beta_2 Exper_{it} + \beta_3 Exper^2_{it} + \beta_4 Weeks_{it} + \beta_5 Eduyrs_{it} \\ &+ \gamma_2 \overline{Exper}_{i} + \gamma_3 \overline{Exper^2}_{i} + \gamma_4 \overline{Weeks}_{i}+\varepsilon_{it} \end{aligned} \] You will need to manually create the variables: \(\{\overline{Exper}_{i}, \overline{Exper^2}_{i},\overline{Weeks}_{i}\}\) - the individual-level averages of each variable. This is referred to as the Mundlack correction. Once you have estimated the model, repeat the Hausman test comparing these results with those of the WG estimator. What is the significance of the Mundlack correction?
foreach var in exper expsq weeks{
bys id: egen av`var' = mean(`var')
}
eststo MUN: xtreg lwage exper expsq weeks ed avexper avexpsq avweeks, re theta
esttab WG LSDV FD MUN, se scalar(N r2 r2_o r2_b r2_w sigma_u sigma_e rho) rename(D.exper exper D.expsq expsq D.weeks weeks) mtitle("WG" "LSDV" "FD" "Mundlack")
hausman MUN FGLS, sigmamore
Random-effects GLS regression Number of obs = 4,165
Group variable: id Number of groups = 595
R-squared: Obs per group:
Within = 0.6566 min = 7
Between = 0.3264 avg = 7.0
Overall = 0.4160 max = 7
Wald chi2(7) = 7107.12
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = .82280511
------------------------------------------------------------------------------
lwage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
exper | .1137879 .0024689 46.09 0.000 .1089489 .1186268
expsq | -.4243693 .0546316 -7.77 0.000 -.5314452 -.3172934
weeks | .0008359 .0005997 1.39 0.163 -.0003395 .0020112
educ | .0737838 .0048985 15.06 0.000 .0641829 .0833846
avexper | -.0756349 .0062087 -12.18 0.000 -.0878036 -.0634662
avexpsq | -.2069027 .1370415 -1.51 0.131 -.4754991 .0616937
avweeks | .0122544 .0041099 2.98 0.003 .0041991 .0203097
_cons | 4.683039 .2100989 22.29 0.000 4.271253 5.094826
-------------+----------------------------------------------------------------
sigma_u | .31951859
sigma_e | .15220316
rho | .81505521 (fraction of variance due to u_i)
------------------------------------------------------------------------------
----------------------------------------------------------------------------
(1) (2) (3) (4)
WG LSDV FD Mundlack
----------------------------------------------------------------------------
exper 0.114*** 0.114*** 0.117*** 0.114***
(0.00247) (0.00247) (0.00631) (0.00247)
expsq -0.424*** -0.424*** -0.532*** -0.424***
(0.0546) (0.0546) (0.139) (0.0546)
weeks 0.000836 0.000836 -0.000268 0.000836
(0.000600) (0.000600) (0.000565) (0.000600)
educ 0 0 0.0738***
(.) (.) (0.00490)
avexper -0.0756***
(0.00621)
avexpsq -0.207
(0.137)
avweeks 0.0123**
(0.00411)
_cons 4.596*** 4.596*** 4.683***
(0.0389) (0.0389) (0.210)
----------------------------------------------------------------------------
N 4165 4165 3570 4165
r2 0.657 0.907 0.221
r2_o 0.0476 0.416
r2_b 0.0276 0.326
r2_w 0.657 0.657
sigma_u 1.036 0.320
sigma_e 0.152 0.152
rho 0.979 0.815
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Note: the rank of the differenced variance matrix (3) does not equal the number
of coefficients being tested (4); be sure this is what you expect, or
there may be problems computing the test. Examine the output of your
estimators for anything unexpected and possibly consider scaling your
variables so that the coefficients are on a similar scale.
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| MUN FGLS Difference Std. err.
-------------+----------------------------------------------------------------
exper | .1137879 .0888609 .0249269 .0012778
expsq | -.4243693 -.772565 .3481957 .0284727
weeks | .0008359 .0009658 -.0001299 .0001108
educ | .0737838 .1117099 -.0379262 .0009972
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 1513.02
Prob > chi2 = 0.0000
(V_b-V_B is not positive definite)
11. Export the results as a single CSV/Excel file. You can use esttab for .csv or outreg2 for .xlsx.
esttab using "problem-set-5-results.csv", replace se scalar(N r2 r2_o r2_b r2_w sigma_u sigma_e rho) rename(D.exper exper D.expsq expsq D.weeks weeks) mtitle("OLS" "BG" "FGLS" "WG" "LSDV" "FD" "Mundlack")(output written to problem-set-5-results.csv)
Postamble
log close name: <unnamed>
log: C:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\we
> bsite\warwick-ec910\problem-sets\ps-5\problem-set-5-log.txt
log type: smcl
closed on: 11 Nov 2024, 12:44:59
-------------------------------------------------------------------------------