Problem Set 6 (SOLUTIONS)
The purpose of the first part of this problem set is to estimate, interpret the results, and compare the results across different binary dependent variable models. In the second part you will estimate and compare different specifications of an endogenous selection model.
First part will be discussed in week 9 and the second part in week 10 of this term.
The data file for this exercise is on Moodle: mus16data.dta. It is a subset of the data used by P. Deb, M. Munkin and P.K. Trivedi (2006): “Bayesian Analysis of Two-Part Model with Endogeneity”, Journal of Applied Econometrics, 21, 1081-1100. Data is for 2001 and comes from the Medical Expenditure Survey. Sample has 3,328 observations.
The main outcome variable of interest is ambulatory expenditure (ambexp) and the regressors are given below.
Since the expenditure data is skewed, we will be using the logged expenditure variable as our dependent variable. You should read Cameron A.C. and Trivedi, P.K. Micro-econometrics using Stata to see the pros and cons regarding whether to log the dependent variable or not.
Note, there is one individual who has an expenditure=1 and this will get coded as 0 when variable is logged. Since it is only one individual, we will ignore the problem by not doing anything. If there are many individuals like this, you will need to see whether you can say why this might be the case.
Dependent variable
ambexp: Ambulatory medical expenditures (excluding dental and outpatient mental). There are 526 individuals with zero expenditure. There is one individual who has expenditure=$1. I am going to assume that this individual did not spend any money.lambexp: ln(ambexp) givenambexp> 0 ; missing otherwisedambexp: 1 ifambexp> 0 and 0 otherwise (binary indicator)
Regressors
ins: health insurance measures, either PPO or HMO type insurancetotchr: health status measures: number of chronic diseasesage: age age in years/10female: 1 for females, zero otherwiseeduc: years of schooling of decision makerblhisp: either black or Hispanicincome: income in USD/1000
Preamble
Create a do-file for this problem set and include a preamble that sets the directory and opens the data. For example,
clear
//or, to remove all stored values (including macros, matrices, scalars, etc.)
*clear all
* Replace $rootdir with the relevant path to on your local harddrive.
cd "$rootdir/problem-sets/ps-6"
cap log close
log using problem-set-6-log.txt, replace
use mus16data.dta, clearC:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\website\warwick
> -ec910\problem-sets\ps-6
-------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\we
> bsite\warwick-ec910\problem-sets\ps-6\problem-set-6-log.txt
log type: smcl
opened on: 19 Nov 2024, 17:40:40
Questions
Part 1
1.1. Obtain and comment on the descriptive statistics for ambexp, lambexp, age, female, educ, blhisp, totchr, ins, income.
su dambexp ambexp lambexp age female educ blhisp totchr ins income
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
dambexp | 3,328 .8419471 .3648454 0 1
ambexp | 3,328 1386.519 2530.406 0 49960
lambexp | 2,802 6.555066 1.41073 0 10.81898
age | 3,328 4.056881 1.121212 2.1 6.4
female | 3,328 .5084135 .5000043 0 1
-------------+---------------------------------------------------------
educ | 3,328 13.40565 2.574199 0 17
blhisp | 3,328 .3085938 .4619824 0 1
totchr | 3,328 .4831731 .7720426 0 5
ins | 3,328 .3650841 .4815261 0 1
income | 3,328 36.80485 26.70121 -90.05 237.301
1.2. Estimate a LP, Probit and a Logit model to explain dambexp. Store the \(\beta\) coefficients and report them in a table.
global xlist age i.female educ i.blhisp totchr i.ins income //Define regressor list $xlist
bys dambexp: su $xlist
** LPM
eststo LPM: reg dambexp $xlist, robust // heterosk needs to be corrected
** probit
eststo probit: probit dambexp $xlist
** logit
eststo logit: logit dambexp $xlist
esttab LPM probit logit, se scalar(N r2 ll) mtitle("LPM" "Probit" "Logit") title(Estimated Coefficients)
-------------------------------------------------------------------------------
-> dambexp = 0
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
age | 526 3.695627 1.076467 2.1 6.4
|
female |
0 | 526 .7338403 .4423695 0 1
1 | 526 .2661597 .4423695 0 1
|
educ | 526 12.48859 2.697241 0 17
|
blhisp |
0 | 526 .5171103 .5001828 0 1
-------------+---------------------------------------------------------
1 | 526 .4828897 .5001828 0 1
|
totchr | 526 .0912548 .3074311 0 2
|
ins |
0 | 526 .6977186 .4596837 0 1
1 | 526 .3022814 .4596837 0 1
|
income | 526 31.63409 23.17116 0 166.78
-------------------------------------------------------------------------------
-> dambexp = 1
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
age | 2,802 4.124697 1.116641 2.1 6.4
|
female |
0 | 2,802 .4461099 .4971761 0 1
1 | 2,802 .5538901 .4971761 0 1
|
educ | 2,802 13.5778 2.513906 0 17
|
blhisp |
0 | 2,802 .7241256 .4470336 0 1
-------------+---------------------------------------------------------
1 | 2,802 .2758744 .4470336 0 1
|
totchr | 2,802 .5567452 .809943 0 5
|
ins |
0 | 2,802 .6231263 .4846893 0 1
1 | 2,802 .3768737 .4846893 0 1
|
income | 2,802 37.77552 27.20742 -90.05 237.301
Linear regression Number of obs = 3,328
F(7, 3320) = 69.43
Prob > F = 0.0000
R-squared = 0.1276
Root MSE = .34114
------------------------------------------------------------------------------
| Robust
dambexp | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0216413 .0056048 3.86 0.000 .0106521 .0326304
1.female | .1394928 .0119061 11.72 0.000 .1161487 .1628368
educ | .0143544 .0025774 5.57 0.000 .009301 .0194078
1.blhisp | -.0889738 .0143088 -6.22 0.000 -.1170288 -.0609188
totchr | .0830088 .0057913 14.33 0.000 .0716539 .0943637
1.ins | .0364663 .0119311 3.06 0.002 .0130733 .0598592
income | .000443 .0002215 2.00 0.046 8.70e-06 .0008774
_cons | .4485313 .0430509 10.42 0.000 .3641224 .5329402
------------------------------------------------------------------------------
Iteration 0: Log likelihood = -1452.4289
Iteration 1: Log likelihood = -1218.0426
Iteration 2: Log likelihood = -1195.6199
Iteration 3: Log likelihood = -1195.5158
Iteration 4: Log likelihood = -1195.5158
Probit regression Number of obs = 3,328
LR chi2(7) = 513.83
Prob > chi2 = 0.0000
Log likelihood = -1195.5158 Pseudo R2 = 0.1769
------------------------------------------------------------------------------
dambexp | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0868152 .0274556 3.16 0.002 .0330032 .1406272
1.female | .6635053 .0609648 10.88 0.000 .5440164 .7829941
educ | .061884 .012039 5.14 0.000 .038288 .0854801
1.blhisp | -.3657835 .0619095 -5.91 0.000 -.4871239 -.2444432
totchr | .7957496 .0712174 11.17 0.000 .656166 .9353332
1.ins | .169107 .0629296 2.69 0.007 .0457673 .2924467
income | .0026773 .0013105 2.04 0.041 .0001088 .0052458
_cons | -.6686471 .1941247 -3.44 0.001 -1.049125 -.2881698
------------------------------------------------------------------------------
Iteration 0: Log likelihood = -1452.4289
Iteration 1: Log likelihood = -1238.914
Iteration 2: Log likelihood = -1194.7039
Iteration 3: Log likelihood = -1192.8189
Iteration 4: Log likelihood = -1192.8089
Iteration 5: Log likelihood = -1192.8089
Logistic regression Number of obs = 3,328
LR chi2(7) = 519.24
Prob > chi2 = 0.0000
Log likelihood = -1192.8089 Pseudo R2 = 0.1787
------------------------------------------------------------------------------
dambexp | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .1618629 .0496641 3.26 0.001 .0645229 .2592028
1.female | 1.226724 .1130182 10.85 0.000 1.005212 1.448235
educ | .1080498 .0210874 5.12 0.000 .0667192 .1493804
1.blhisp | -.6666381 .1093362 -6.10 0.000 -.8809332 -.452343
totchr | 1.554664 .1499606 10.37 0.000 1.260746 1.848581
1.ins | .296996 .1133154 2.62 0.009 .0749019 .5190901
income | .00462 .0024332 1.90 0.058 -.000149 .009389
_cons | -1.26832 .3407805 -3.72 0.000 -1.936237 -.600402
------------------------------------------------------------------------------
Estimated Coefficients
------------------------------------------------------------
(1) (2) (3)
LPM Probit Logit
------------------------------------------------------------
main
age 0.0216*** 0.0868** 0.162**
(0.00560) (0.0275) (0.0497)
0.female 0 0 0
(.) (.) (.)
1.female 0.139*** 0.664*** 1.227***
(0.0119) (0.0610) (0.113)
educ 0.0144*** 0.0619*** 0.108***
(0.00258) (0.0120) (0.0211)
0.blhisp 0 0 0
(.) (.) (.)
1.blhisp -0.0890*** -0.366*** -0.667***
(0.0143) (0.0619) (0.109)
totchr 0.0830*** 0.796*** 1.555***
(0.00579) (0.0712) (0.150)
0.ins 0 0 0
(.) (.) (.)
1.ins 0.0365** 0.169** 0.297**
(0.0119) (0.0629) (0.113)
income 0.000443* 0.00268* 0.00462
(0.000222) (0.00131) (0.00243)
_cons 0.449*** -0.669*** -1.268***
(0.0431) (0.194) (0.341)
------------------------------------------------------------
N 3328 3328 3328
r2 0.128
ll -1139.1 -1195.5 -1192.8
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
1.3. Estimate the Marginal Effect at the Mean for each model, and report them in a table. You will want to use the estpost margins post-estimation command, with the relevant option for MEM. Pay special attention to the treatment of discrete regressors. Hint: check to see any differences in the estimated MEs based on whether you use factor notation; for example, i.female vs female.
est clear
** LPM
qui reg dambexp $xlist, robust
estpost margins, dydx(*) atmean
est store LPM
** probit
qui probit dambexp $xlist
estpost margins, dydx(*) atmean
est store probit
** logit
qui logit dambexp $xlist
estpost margins, dydx(*) atmean
est store logit
esttab LPM probit logit, se scalar(N r2 ll) mtitle("LPM" "Probit" "Logit") title(Marginal Effects at the Mean)
Conditional marginal effects Number of obs = 3,328
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
At: age = 4.056881 (mean)
0.female = .4915865 (mean)
1.female = .5084135 (mean)
educ = 13.40565 (mean)
0.blhisp = .6914063 (mean)
1.blhisp = .3085938 (mean)
totchr = .4831731 (mean)
0.ins = .6349159 (mean)
1.ins = .3650841 (mean)
income = 36.80485 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0216413 .0056048 3.86 0.000 .0106521 .0326304
1.female | .1394928 .0119061 11.72 0.000 .1161487 .1628368
educ | .0143544 .0025774 5.57 0.000 .009301 .0194078
1.blhisp | -.0889738 .0143088 -6.22 0.000 -.1170288 -.0609188
totchr | .0830088 .0057913 14.33 0.000 .0716539 .0943637
1.ins | .0364663 .0119311 3.06 0.002 .0130733 .0598592
income | .000443 .0002215 2.00 0.046 8.70e-06 .0008774
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Conditional marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: Pr(dambexp), predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
At: age = 4.056881 (mean)
0.female = .4915865 (mean)
1.female = .5084135 (mean)
educ = 13.40565 (mean)
0.blhisp = .6914063 (mean)
1.blhisp = .3085938 (mean)
totchr = .4831731 (mean)
0.ins = .6349159 (mean)
1.ins = .3650841 (mean)
income = 36.80485 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0152201 .004837 3.15 0.002 .0057396 .0247005
1.female | .1184629 .0112862 10.50 0.000 .0963423 .1405835
educ | .0108492 .0021305 5.09 0.000 .0066736 .0150249
1.blhisp | -.0701607 .0130287 -5.39 0.000 -.0956964 -.0446249
totchr | .1395073 .0102098 13.66 0.000 .1194966 .1595181
1.ins | .0288089 .0104412 2.76 0.006 .0083445 .0492733
income | .0004694 .0002295 2.05 0.041 .0000195 .0009192
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Conditional marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: Pr(dambexp), predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
At: age = 4.056881 (mean)
0.female = .4915865 (mean)
1.female = .5084135 (mean)
educ = 13.40565 (mean)
0.blhisp = .6914063 (mean)
1.blhisp = .3085938 (mean)
totchr = .4831731 (mean)
0.ins = .6349159 (mean)
1.ins = .3650841 (mean)
income = 36.80485 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0135771 .0042044 3.23 0.001 .0053365 .0218176
1.female | .1068865 .0107295 9.96 0.000 .085857 .1279159
educ | .0090632 .0018074 5.01 0.000 .0055208 .0126056
1.blhisp | -.062462 .0116115 -5.38 0.000 -.0852201 -.039704
totchr | .1304052 .0093686 13.92 0.000 .112043 .1487674
1.ins | .0241537 .0089994 2.68 0.007 .0065151 .0417922
income | .0003875 .0002043 1.90 0.058 -.0000128 .0007879
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Marginal Effects at the Mean
------------------------------------------------------------
(1) (2) (3)
LPM Probit Logit
------------------------------------------------------------
age 0.0216*** 0.0152** 0.0136**
(0.00560) (0.00484) (0.00420)
0.female 0 0 0
(.) (.) (.)
1.female 0.139*** 0.118*** 0.107***
(0.0119) (0.0113) (0.0107)
educ 0.0144*** 0.0108*** 0.00906***
(0.00258) (0.00213) (0.00181)
0.blhisp 0 0 0
(.) (.) (.)
1.blhisp -0.0890*** -0.0702*** -0.0625***
(0.0143) (0.0130) (0.0116)
totchr 0.0830*** 0.140*** 0.130***
(0.00579) (0.0102) (0.00937)
0.ins 0 0 0
(.) (.) (.)
1.ins 0.0365** 0.0288** 0.0242**
(0.0119) (0.0104) (0.00900)
income 0.000443* 0.000469* 0.000388
(0.000222) (0.000230) (0.000204)
------------------------------------------------------------
N 3328 3328 3328
r2 0.128
ll -1139.1 -1195.5 -1192.8
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
1.4. Estimate the Average Marginal Effect at the Mean for each model, and report them in a table.You will want to use the estpost margins post-estimation command, with the relevant option for AME.
est clear
** LPM
qui reg dambexp $xlist, robust
estpost margins, dydx(*)
est store LPM
** probit
qui probit dambexp $xlist
estpost margins, dydx(*)
est store probit
** logit
qui logit dambexp $xlist
estpost margins, dydx(*)
est store logit
esttab LPM probit logit, se scalar(N r2 ll) mtitle("LPM" "Probit" "Logit") title(Average Marginal Effects)
Average marginal effects Number of obs = 3,328
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0216413 .0056048 3.86 0.000 .0106521 .0326304
1.female | .1394928 .0119061 11.72 0.000 .1161487 .1628368
educ | .0143544 .0025774 5.57 0.000 .009301 .0194078
1.blhisp | -.0889738 .0143088 -6.22 0.000 -.1170288 -.0609188
totchr | .0830088 .0057913 14.33 0.000 .0716539 .0943637
1.ins | .0364663 .0119311 3.06 0.002 .0130733 .0598592
income | .000443 .0002215 2.00 0.046 8.70e-06 .0008774
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Average marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: Pr(dambexp), predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0173895 .0054832 3.17 0.002 .0066426 .0281365
1.female | .132906 .0118133 11.25 0.000 .1097524 .1560596
educ | .0123957 .0023862 5.19 0.000 .0077189 .0170725
1.blhisp | -.0777517 .0137954 -5.64 0.000 -.1047901 -.0507133
totchr | .1593929 .0139062 11.46 0.000 .1321371 .1866486
1.ins | .033324 .0121638 2.74 0.006 .0094834 .0571646
income | .0005363 .0002621 2.05 0.041 .0000225 .0010501
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Average marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: Pr(dambexp), predict()
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0181007 .0055261 3.28 0.001 .0072697 .0289317
1.female | .1357958 .0117423 11.56 0.000 .1127813 .1588102
educ | .0120829 .0023248 5.20 0.000 .0075264 .0166394
1.blhisp | -.0795676 .0136638 -5.82 0.000 -.1063481 -.0527871
totchr | .1738536 .0164518 10.57 0.000 .1416087 .2060985
1.ins | .0326182 .0121752 2.68 0.007 .0087552 .0564812
income | .0005166 .0002718 1.90 0.057 -.000016 .0010493
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Average Marginal Effects
------------------------------------------------------------
(1) (2) (3)
LPM Probit Logit
------------------------------------------------------------
age 0.0216*** 0.0174** 0.0181**
(0.00560) (0.00548) (0.00553)
0.female 0 0 0
(.) (.) (.)
1.female 0.139*** 0.133*** 0.136***
(0.0119) (0.0118) (0.0117)
educ 0.0144*** 0.0124*** 0.0121***
(0.00258) (0.00239) (0.00232)
0.blhisp 0 0 0
(.) (.) (.)
1.blhisp -0.0890*** -0.0778*** -0.0796***
(0.0143) (0.0138) (0.0137)
totchr 0.0830*** 0.159*** 0.174***
(0.00579) (0.0139) (0.0165)
0.ins 0 0 0
(.) (.) (.)
1.ins 0.0365** 0.0333** 0.0326**
(0.0119) (0.0122) (0.0122)
income 0.000443* 0.000536* 0.000517
(0.000222) (0.000262) (0.000272)
------------------------------------------------------------
N 3328 3328 3328
r2 0.128
ll -1139.1 -1195.5 -1192.8
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
1.5. Check to see how well the prodbit model predicts the outcome using the estat classification post-estimation command.
qui probit dambexp age i.female educ i.blhisp totchr i.ins income
estat classification
Probit model for dambexp
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 2768 469 | 3237
- | 34 57 | 91
-----------+--------------------------+-----------
Total | 2802 526 | 3328
Classified + if predicted Pr(D) >= .5
True D defined as dambexp != 0
--------------------------------------------------
Sensitivity Pr( +| D) 98.79%
Specificity Pr( -|~D) 10.84%
Positive predictive value Pr( D| +) 85.51%
Negative predictive value Pr(~D| -) 62.64%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 89.16%
False - rate for true D Pr( -| D) 1.21%
False + rate for classified + Pr(~D| +) 14.49%
False - rate for classified - Pr( D| -) 37.36%
--------------------------------------------------
Correctly classified 84.89%
--------------------------------------------------
1.6. Construct and interpret the LR test for the omission of income in the probit model. Do this in two ways: (1) using the post estimation lrtest; (2) manually recreate (1)’s results (both test-statistic and p-value).
est clear
** Remove income from xlist
global xlist age i.female educ i.blhisp totchr i.ins
eststo modU: qui probit dambexp $xlist income
scalar logl_U = e(ll)
eststo modR: qui probit dambexp $xlist
scalar logl_R = e(ll)
lrtest modU modR
** Replicate
scalar stat = 2 * (logl_U - logl_R)
scalar pval = chi2tail(1,stat)
scalar list
Likelihood-ratio test
Assumption: modR nested within modU
LR chi2(1) = 4.30
Prob > chi2 = 0.0382
pval = .03817363
stat = 4.297269
logl_R = -1197.6644
logl_U = -1195.5158
Part 2
Estimate the following models for lambexp treating the selection into non-zero lambexp value as endogenous using, both Heckman 2-step method and also MLE.
In the main data lambexp is missing for values of ambexp=0. Before proceeding,
replace lambexp = 0 if ambexp==0(526 real changes made)
This will correction will also treat observations with ambexp=1 as equivalent to =0; however, this is only a single observation.
** Remove income from xlist
global xlist age i.female educ i.blhisp totchr i.ins2.1. Estimate the Heckman 2-step estimator and store the results. In addition, store the Mills ratio as a separate variable. Use income as the excluded variable. This means that income appears in the selection equation, but NOT the main equation.
eststo heck_2sW: heckman lambexp $xlist, select(dambexp = $xlist income) twostep mills(mills_a)
Heckman selection model -- two-step estimates Number of obs = 3,328
(regression model with sample selection) Selected = 2,802
Nonselected = 526
Wald chi2(6) = 193.43
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lambexp |
age | .2024668 .0242202 8.36 0.000 .1549961 .2499374
1.female | .2921341 .0725756 4.03 0.000 .1498886 .4343796
educ | .0123889 .0115682 1.07 0.284 -.0102844 .0350622
1.blhisp | -.1828659 .0653449 -2.80 0.005 -.3109396 -.0547922
totchr | .5006332 .0485548 10.31 0.000 .4054675 .5957988
1.ins | -.0465097 .0529742 -0.88 0.380 -.1503373 .0573179
_cons | 5.288927 .288522 18.33 0.000 4.723435 5.85442
-------------+----------------------------------------------------------------
dambexp |
age | .0868152 .0274556 3.16 0.002 .0330032 .1406272
1.female | .6635053 .0609648 10.88 0.000 .5440165 .7829941
educ | .061884 .012039 5.14 0.000 .038288 .0854801
1.blhisp | -.3657835 .0619095 -5.91 0.000 -.4871239 -.2444432
totchr | .7957496 .0712174 11.17 0.000 .656166 .9353332
1.ins | .169107 .0629296 2.69 0.007 .0457673 .2924467
income | .0026773 .0013105 2.04 0.041 .0001088 .0052458
_cons | -.6686471 .1941247 -3.44 0.001 -1.049125 -.2881698
-------------+----------------------------------------------------------------
/mills |
lambda | -.4637133 .2825997 -1.64 0.101 -1.017598 .090172
-------------+----------------------------------------------------------------
rho | -0.35907
sigma | 1.2914258
------------------------------------------------------------------------------
2.2. Replicate these results by applying the following steps: (1) estimate the selection equation using a probit model; (2) create the mills ratio; (3) compare your mills ratio with the one stored above; (4) estimate the main equation, including the mills ratio.
probit dambexp $xlist income
predict index, xb
gen mills = normalden(index)/normal(index)
compare mills mills_a
reg lambexp $xlist mills
Iteration 0: Log likelihood = -1452.4289
Iteration 1: Log likelihood = -1218.0426
Iteration 2: Log likelihood = -1195.6199
Iteration 3: Log likelihood = -1195.5158
Iteration 4: Log likelihood = -1195.5158
Probit regression Number of obs = 3,328
LR chi2(7) = 513.83
Prob > chi2 = 0.0000
Log likelihood = -1195.5158 Pseudo R2 = 0.1769
------------------------------------------------------------------------------
dambexp | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0868152 .0274556 3.16 0.002 .0330032 .1406272
1.female | .6635053 .0609648 10.88 0.000 .5440164 .7829941
educ | .061884 .012039 5.14 0.000 .038288 .0854801
1.blhisp | -.3657835 .0619095 -5.91 0.000 -.4871239 -.2444432
totchr | .7957496 .0712174 11.17 0.000 .656166 .9353332
1.ins | .169107 .0629296 2.69 0.007 .0457673 .2924467
income | .0026773 .0013105 2.04 0.041 .0001088 .0052458
_cons | -.6686471 .1941247 -3.44 0.001 -1.049125 -.2881698
------------------------------------------------------------------------------
---------- Difference ----------
Count Minimum Average Maximum
------------------------------------------------------------------------
mills<mills_a 1660 -5.02e-08 -8.91e-09 -1.75e-14
mills>mills_a 1668 7.75e-15 8.64e-09 5.45e-08
----------
Jointly defined 3328 -5.02e-08 -1.10e-10 5.45e-08
----------
Total 3328
Source | SS df MS Number of obs = 3,328
-------------+---------------------------------- F(7, 3320) = 164.14
Model | 6325.70678 7 903.672398 Prob > F = 0.0000
Residual | 18278.1125 3,320 5.50545557 R-squared = 0.2571
-------------+---------------------------------- Adj R-squared = 0.2555
Total | 24603.8193 3,327 7.39519666 Root MSE = 2.3464
------------------------------------------------------------------------------
lambexp | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .1628716 .0410937 3.96 0.000 .0823001 .2434431
1.female | .1898127 .1257893 1.51 0.131 -.0568198 .4364451
educ | -.0016555 .0199775 -0.08 0.934 -.0408251 .037514
1.blhisp | -.1086689 .1107824 -0.98 0.327 -.3258776 .1085397
totchr | .4202052 .0839337 5.01 0.000 .2556382 .5847723
1.ins | -.0686172 .0899672 -0.76 0.446 -.2450141 .1077796
mills | -4.592193 .4582359 -10.02 0.000 -5.490646 -3.693739
_cons | 5.904903 .5004624 11.80 0.000 4.923657 6.886149
------------------------------------------------------------------------------
2.3 Estimate the marginal effects of the selection equation. You can do this using the margins command, with predict() option psel. This should correspond to a probit model estimation above.
qui heckman lambexp $xlist, select(dambexp = $xlist income) twostep
margins, dydx(*) predict(psel)
Average marginal effects Number of obs = 3,328
Model VCE: Conventional
Expression: Pr(dambexp), predict(psel)
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0173895 .0054832 3.17 0.002 .0066426 .0281365
1.female | .132906 .0118133 11.25 0.000 .1097524 .1560596
educ | .0123957 .0023862 5.19 0.000 .0077189 .0170725
1.blhisp | -.0777517 .0137954 -5.64 0.000 -.1047901 -.0507133
totchr | .1593929 .0139062 11.46 0.000 .1321371 .1866486
1.ins | .033324 .0121638 2.74 0.006 .0094834 .0571646
income | .0005363 .0002621 2.05 0.041 .0000225 .0010501
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
2.4. Estimate the Maximum Likelihood version of the Heckmann correction (with an excluded variable) and store the results.
eststo heck_mlW: heckman lambexp $xlist, select(dambexp = $xlist income) nolog mills(mills_a_mle)
Heckman selection model Number of obs = 3,328
(regression model with sample selection) Selected = 2,802
Nonselected = 526
Wald chi2(6) = 288.88
Log likelihood = -5836.219 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lambexp |
age | .2119749 .0230072 9.21 0.000 .1668816 .2570682
1.female | .3481441 .0601142 5.79 0.000 .2303223 .4659658
educ | .018716 .0105473 1.77 0.076 -.0019563 .0393883
1.blhisp | -.2185714 .0596687 -3.66 0.000 -.3355199 -.101623
totchr | .53992 .0393324 13.73 0.000 .4628299 .61701
1.ins | -.0299871 .0510882 -0.59 0.557 -.1301182 .0701439
_cons | 5.044056 .2281259 22.11 0.000 4.596938 5.491175
-------------+----------------------------------------------------------------
dambexp |
age | .0879359 .027421 3.21 0.001 .0341917 .14168
1.female | .6626649 .0609384 10.87 0.000 .5432278 .7821021
educ | .0619485 .0120295 5.15 0.000 .0383711 .0855258
1.blhisp | -.3639377 .0618734 -5.88 0.000 -.4852073 -.2426682
totchr | .7969518 .0711306 11.20 0.000 .6575383 .9363653
1.ins | .1701367 .0628711 2.71 0.007 .0469117 .2933618
income | .0027078 .0013168 2.06 0.040 .000127 .0052886
_cons | -.6760546 .1940288 -3.48 0.000 -1.056344 -.2957652
-------------+----------------------------------------------------------------
/athrho | -.1313456 .1496292 -0.88 0.380 -.4246134 .1619222
/lnsigma | .2398173 .0144598 16.59 0.000 .2114767 .268158
-------------+----------------------------------------------------------------
rho | -.1305955 .1470772 -.4008098 .1605217
sigma | 1.271017 .0183786 1.235501 1.307554
lambda | -.1659891 .1878698 -.5342072 .2022291
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 0.91 Prob > chi2 = 0.3406
2.5 Compute the marginal effects of each regressor for: (1) probability of selection; (2) the expected value of the outcome; and (3) the expected value of the outcome, conditional on selection. You will need to use the post-estimation command margins, dydx(*) predict() with predict options: psel, yexpected, and ycond.
qui heckman lambexp $xlist, select(dambexp = $xlist income) nolog
margins, dydx(*) predict(psel)
margins, dydx(*) predict(yexpected)
margins, dydx(*) predict(ycond)
Average marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: Pr(dambexp), predict(psel)
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0176149 .0054761 3.22 0.001 .006882 .0283479
1.female | .1327517 .0118078 11.24 0.000 .1096089 .1558945
educ | .0124093 .0023845 5.20 0.000 .0077357 .0170828
1.blhisp | -.0773377 .0137795 -5.61 0.000 -.1043449 -.0503305
totchr | .159642 .013898 11.49 0.000 .1324024 .1868817
1.ins | .033526 .0121515 2.76 0.006 .0097095 .0573425
income | .0005424 .0002634 2.06 0.039 .0000262 .0010586
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Average marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: E(lambexp*|Pr(dambexp)), predict(yexpected)
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .2897346 .0381846 7.59 0.000 .2148942 .364575
1.female | 1.14081 .0832165 13.71 0.000 .9777089 1.303911
educ | .0941877 .0167238 5.63 0.000 .0614096 .1269658
1.blhisp | -.6692709 .095073 -7.04 0.000 -.8556106 -.4829312
totchr | 1.463455 .0890935 16.43 0.000 1.288834 1.638075
1.ins | .1863924 .0856166 2.18 0.029 .0185871 .3541978
income | .0034285 .0016722 2.05 0.040 .0001511 .0067059
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Average marginal effects Number of obs = 3,328
Model VCE: OIM
Expression: E(lambexp|Zg>0), predict(ycond)
dy/dx wrt: age 1.female educ 1.blhisp totchr 1.ins income
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .2165793 .0222037 9.75 0.000 .1730609 .2600977
1.female | .3826391 .0486389 7.87 0.000 .2873087 .4779695
educ | .0219597 .0097532 2.25 0.024 .0028438 .0410755
1.blhisp | -.2385954 .0551014 -4.33 0.000 -.3465921 -.1305986
totchr | .5816493 .0379133 15.34 0.000 .5073406 .6559579
1.ins | -.0212273 .0499484 -0.42 0.671 -.1191243 .0766697
income | .0001418 .0001762 0.80 0.421 -.0002036 .0004872
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
2.6. Now re-estimate the two-step and MLE approach without an excluded variable, storing the results each time. This means that the same set of regressors enter both equations. i.e. include income in the outcome equation.
global xlist age i.female educ i.blhisp totchr i.ins income
eststo heck_2sWO: heckman lambexp $xlist, select(dambexp = $xlist) twostep mills(mills_b)
eststo heck_mlWO: heckman lambexp $xlist, select(dambexp = $xlist) nolog mills(mills_b_mle)
Heckman selection model -- two-step estimates Number of obs = 3,328
(regression model with sample selection) Selected = 2,802
Nonselected = 526
Wald chi2(7) = 192.92
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lambexp |
age | .2043022 .0244086 8.37 0.000 .1564622 .2521422
1.female | .2786877 .0750154 3.72 0.000 .1316602 .4257151
educ | .0141631 .0118462 1.20 0.232 -.0090551 .0373812
1.blhisp | -.1797416 .0656337 -2.74 0.006 -.3083812 -.051102
totchr | .4938391 .049539 9.97 0.000 .3967445 .5909337
1.ins | -.0461181 .053113 -0.87 0.385 -.1502176 .0579815
income | -.0007456 .0010158 -0.73 0.463 -.0027367 .0012454
_cons | 5.306311 .2901551 18.29 0.000 4.737617 5.875004
-------------+----------------------------------------------------------------
dambexp |
age | .0868152 .0274556 3.16 0.002 .0330032 .1406272
1.female | .6635053 .0609648 10.88 0.000 .5440165 .7829941
educ | .061884 .012039 5.14 0.000 .038288 .0854801
1.blhisp | -.3657835 .0619095 -5.91 0.000 -.4871239 -.2444432
totchr | .7957496 .0712174 11.17 0.000 .656166 .9353332
1.ins | .169107 .0629296 2.69 0.007 .0457673 .2924467
income | .0026773 .0013105 2.04 0.041 .0001088 .0052458
_cons | -.6686471 .1941247 -3.44 0.001 -1.049125 -.2881698
-------------+----------------------------------------------------------------
/mills |
lambda | -.5087361 .2894687 -1.76 0.079 -1.076084 .0586121
-------------+----------------------------------------------------------------
rho | -0.39250
sigma | 1.2961455
------------------------------------------------------------------------------
Heckman selection model Number of obs = 3,328
(regression model with sample selection) Selected = 2,802
Nonselected = 526
Wald chi2(7) = 285.98
Log likelihood = -5836.09 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lambexp |
age | .2137594 .0232969 9.18 0.000 .1680983 .2594205
1.female | .342293 .0615522 5.56 0.000 .2216528 .4629332
educ | .0202746 .0110032 1.84 0.065 -.0012913 .0418406
1.blhisp | -.2185104 .0598099 -3.65 0.000 -.3357357 -.1012852
totchr | .5375964 .0398453 13.49 0.000 .459501 .6156918
1.ins | -.0287728 .0511856 -0.56 0.574 -.1290946 .0715491
income | -.0005026 .000989 -0.51 0.611 -.0024411 .0014359
_cons | 5.041712 .229726 21.95 0.000 4.591458 5.491967
-------------+----------------------------------------------------------------
dambexp |
age | .0878613 .0274099 3.21 0.001 .034139 .1415837
1.female | .6628035 .060929 10.88 0.000 .5433848 .7822223
educ | .0617998 .0120332 5.14 0.000 .0382152 .0853844
1.blhisp | -.3636885 .0618724 -5.88 0.000 -.4849562 -.2424207
totchr | .7968988 .0711265 11.20 0.000 .6574934 .9363041
1.ins | .1699645 .0628669 2.70 0.007 .0467476 .2931815
income | .0027483 .0013209 2.08 0.037 .0001595 .0053372
_cons | -.675346 .1939739 -3.48 0.000 -1.055528 -.295164
-------------+----------------------------------------------------------------
/athrho | -.1419126 .1535634 -0.92 0.355 -.4428913 .1590661
/lnsigma | .240186 .0146925 16.35 0.000 .2113892 .2689828
-------------+----------------------------------------------------------------
rho | -.1409675 .1505118 -.4160382 .157738
sigma | 1.271486 .0186813 1.235393 1.308633
lambda | -.1792382 .1924853 -.5565025 .1980261
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 1.02 Prob > chi2 = 0.3122
2.7. Create a table that reports the four models alongside one another and compare the results.
esttab heck_2sW heck_2sWO heck_mlW heck_mlWO, se scalar(N) mtitle("2-step,w/" "2-step,w/o" "ML,w/" "ML,w/o") title(Heckman Selection Models) drop(0.*)
Heckman Selection Models
----------------------------------------------------------------------------
(1) (2) (3) (4)
2-step,w/ 2-step,w/o ML,w/ ML,w/o
----------------------------------------------------------------------------
lambexp
age 0.202*** 0.204*** 0.212*** 0.214***
(0.0242) (0.0244) (0.0230) (0.0233)
1.female 0.292*** 0.279*** 0.348*** 0.342***
(0.0726) (0.0750) (0.0601) (0.0616)
educ 0.0124 0.0142 0.0187 0.0203
(0.0116) (0.0118) (0.0105) (0.0110)
1.blhisp -0.183** -0.180** -0.219*** -0.219***
(0.0653) (0.0656) (0.0597) (0.0598)
totchr 0.501*** 0.494*** 0.540*** 0.538***
(0.0486) (0.0495) (0.0393) (0.0398)
1.ins -0.0465 -0.0461 -0.0300 -0.0288
(0.0530) (0.0531) (0.0511) (0.0512)
income -0.000746 -0.000503
(0.00102) (0.000989)
_cons 5.289*** 5.306*** 5.044*** 5.042***
(0.289) (0.290) (0.228) (0.230)
----------------------------------------------------------------------------
dambexp
age 0.0868** 0.0868** 0.0879** 0.0879**
(0.0275) (0.0275) (0.0274) (0.0274)
1.female 0.664*** 0.664*** 0.663*** 0.663***
(0.0610) (0.0610) (0.0609) (0.0609)
educ 0.0619*** 0.0619*** 0.0619*** 0.0618***
(0.0120) (0.0120) (0.0120) (0.0120)
1.blhisp -0.366*** -0.366*** -0.364*** -0.364***
(0.0619) (0.0619) (0.0619) (0.0619)
totchr 0.796*** 0.796*** 0.797*** 0.797***
(0.0712) (0.0712) (0.0711) (0.0711)
1.ins 0.169** 0.169** 0.170** 0.170**
(0.0629) (0.0629) (0.0629) (0.0629)
income 0.00268* 0.00268* 0.00271* 0.00275*
(0.00131) (0.00131) (0.00132) (0.00132)
_cons -0.669*** -0.669*** -0.676*** -0.675***
(0.194) (0.194) (0.194) (0.194)
----------------------------------------------------------------------------
/mills
lambda -0.464 -0.509
(0.283) (0.289)
----------------------------------------------------------------------------
/
athrho -0.131 -0.142
(0.150) (0.154)
lnsigma 0.240*** 0.240***
(0.0145) (0.0147)
----------------------------------------------------------------------------
N 3328 3328 3328 3328
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Postamble
log close name: <unnamed>
log: C:\Users\neil_\OneDrive - University of Warwick\Documents\EC910\we
> bsite\warwick-ec910\problem-sets\ps-6\problem-set-6-log.txt
log type: smcl
closed on: 19 Nov 2024, 17:40:52
-------------------------------------------------------------------------------