Notes on adapting Dix‐Carneiro and Kovak (2017) code - jamiefogel/Networks GitHub Wiki
Miscellaneous Notes
- It seems that they have the variable
idade
(age) in all years. In our data we havefx_etaria
, which is a categorical version of age, for 1993 and earlier. I believe theiridade
variable is also categorical for <=93 and they do some recoding to get categorical age in0b_Panel_1986_2010.do
. I think we can make their code consistent with our data by simply loadingfx_etaria
pre-1994 and then renaming it toidade
. mmc
is their microregion code. They do some manipulating to ensure consistency over time; I'm not yet sure how this works.rtr_kume_main
is the trade shock variable that is the independent variable of interest in equation (3). It comes from the data setData/rtc_kume.dta
. That data set is uniquely identified bymmc
and contains various different RTR measures. "Kume" refers to the fact that the tariff data comes from Kume et al. (2003).-
Data/rtc_kume.dta
is created in line 144 ofDixCarneiro_Kovak_2017/Codes_Other/figure_2.do
:save ../Data/rtc_kume, replace
. It builds upon the following input files:../Data_Census/code_sample
. I think this is one of the files that was corrupted when trying to download the replication package, but we don't actually need it to produceData/rtc_kume.dta
because it is used to create../Data/lambda
(saved on line 50 of figure_2.do), so I can just comment out lines 18-50 and load../Data/lambda
on line 51.../Data_Other/theta_indmatch
../Data/tariff_chg_kume
-
The line:
gen rtr_kume_main = -rtc_kume_main
occurs in lots of do files -
In figure_2.do
rtc_kume_main
is defined usingrename rtc_kume_t_theta_1990_1995 rtc_kume_main
https://github.com/jamiefogel/Networks/blob/832d4c59e0b4fcb6bd3fb292c31067d9856631c8/Code/DixCarneiro_Kovak_2017/Codes_Other/figure_2.do#L138:149.
-
figure_2.do
Breakdown of how the RTR variable is created in ../Data_Census/code_sample.dta
is processed to create../Data/lambda.dta
. This data set is uniquely identified bymmc
andindmatch
and contains the variablelambda
which is the share of regional labor initially allocated to tradable industry i. If we sum lambda by region (mmc
) it produces a value of 1 for each region:collapse (sum) lambda, by(mmc)
- Merge on "thetas", which I think is what is called φ in the paper (equation 1):
merge m:1 indmatch using ../Data_Other/theta_indmatch
. This data set is uniquely identified byindmatch
. The variable theta has a min of 0.32, max of 0.89,and mean of 0.63. In the paper, φ is the cost share of non labor factors so it makes sense that this would range from 032 to 0.89 with a mean of 0.63. - Creates the betas (weights on the trade shocks in equation 1; analogous to the shares in a Bartik instrument) in a variety of ways. These variables are saved in
../Data/beta_indmatch.dta
.- "including nontradables, with theta adjustment" [note that the comment on line 66 says "including nontradables, without theta adjustment" but I'm almost positive it should be "including nontradables, with theta adjustment"]. In this case the betas are just equal to the lambdas. I am guessing we will want to use this measure for simplicity as long as the results are approximately the same.
- "including nontradables, with theta adjustment"
- "omitting nontradables, without theta adjustment"
- I believe that nontradables are indmatch==99
- "omitting nontradables, with theta adjustment"
- I think this is their preferred spec.
- Merge the tariff changes from Kume et al (
../Data/tariff_chg_kume
, created in figure_1.do) onto the the betas data from above.tariff_chg_kume.dta
is uniquely identified byindmatch
. There is no value for nontradables (indmatch=99
)
- Create weighted (by beta) averages of the tariff changes.
- Does this for the 4 combinations of {theta, no theta} X {omitting nontradables, including nontradables}
- Also does this for tariff changes and something else called
erp
which also comes from../Data/tariff_chg_kume
and is renamed torec_kume_main
. Theera
variables replace nominal tariffs with "Effective Rates of Protection." Effective rates of protection capture the overall effect of liberalization on producers in a given industry, accounting for tariff changes on industry inputs and outputs. According to Appendix B.7 the tariffs and erp are correlated 0.99 so I'm happy ignoring erp.
- The preferred RTR measure
rtc_kume_main
is a renamed version ofrtc_kume_t_theta_1990_1995
. This corresponds to RTR_r in equation (2). It (i) does the "theta adjustment" (phi in the paper) and (ii) omits nontradables.
- The preferred variable is
rtc_kume_t_theta_1990_1995
which is renamed tortc_kume_main
and then used in1_Main_Regressions_Earnings.do
(renamed fromrtr_kume_main
tortc_kume_main
in line 99 of1_Main_Regressions_Earnings.do
) and analogously in2_Main_Regressions_Employment.do
.
Questions/next steps:
- Their measure of region is some sort of time-consistent micro region. Need to figure out how to map this to our micro regions and/or codemuns.
- I think the answer is to use the data set
DixCarneiro_Kovak_2017/Data_Other/rais_codemun_to_mmc_1970_2010.dta
- Update: yes this has a match rate to our RAIS data set of 99%
- How does their industry measure
indmatch
map to notions of industry that we have?- From Appendix A.2: "Establishment industry is reported using the Subsetor IBGE classification, which includes 12 manufacturing industries, 2 primary industries, 11 nontradable industries, and 1 other/ignored... A less aggregate industry classification (CNAE) is available from 1994 onward, but we need a consistent classification from 1986-2010, so we use Subsetor IBGE." We have the corresponding variable
subs_ibge
that we should pull and start using.
- From Appendix A.2: "Establishment industry is reported using the Subsetor IBGE classification, which includes 12 manufacturing industries, 2 primary industries, 11 nontradable industries, and 1 other/ignored... A less aggregate industry classification (CNAE) is available from 1994 onward, but we need a consistent classification from 1986-2010, so we use Subsetor IBGE." We have the corresponding variable
- Note that the variable
subs_ibge
is used as a control for regional earnings premia regressions on RAIS but is not outside of RAIS. Thus it doesn't directly map toindmatch
as far as I can tell. - I believe that
indmatch
corresponds to the "Consistent Industry Classification Across Censuses and Tariff Data" in Appendix Table A.1. All nontradables are combined into a single industry:indmatch=99
. - The question is, how do I map indmatch to something on RAIS? Code/DixCarneiro_Kovak_2017/Data_Census/code_sample_1970.do might be helpful. Also Code/DixCarneiro_Kovak_2017/Data_Other/Data_Other_Descriptions.txt. Also https://github.com/jamiefogel/Networks/blob/832d4c59e0b4fcb6bd3fb292c31067d9856631c8/Code/DixCarneiro_Kovak_2017/Data_Census/code_sample_1980.do#L31 and more.
- They provide tariffs by
subs_ibge
inkume_subsibge.dta
. That data set is uniquely identified bysubsibge
andyear
. It also has a variablesubsibge_rais
that is a 1:1 mapping withsubsibge
and I believe corresponds to the different encoding ofsubs_ibge
in their version of RAIS. They only have 14 subsectors inkume_subsibge.dta
; I believe these correspond to only the tradable sectors. My tentative plan is to just use the tariff changes in../Data/tariff_chg_kume_subsibge
(derived fromkume_subsibge.dta
) rather than those in../Data/tariff_chg_kume
(and used infigure_2.do
).
- Our regional earnings premia do not match theirs. Check to see how correlated they are. If highly correlated, then hopefully we can just ignore discrepancies.
- I checked the regressions in 1a_RegionalEarningsPremia_jsf and they have 24 industries and we have 26. This might be part of the discrepancy.
- SOLVED. Our sample sizes are slightly different than theirs but the regional earnings premia are correlated >0.99 so I'm calling this good enough. Their
subs_ibge
variable is coded differently than ours and they drop a couple values. As a result we are not dropping these values. The following code resolves the issue:
replace subs_ibge = "9999" if subs_ibge=="26" // This one is sketchy
replace subs_ibge = "5822" if subs_ibge=="23"
replace subs_ibge = "4405" if subs_ibge=="01"
replace subs_ibge = "4509" if subs_ibge=="12"
replace subs_ibge = "5824" if subs_ibge=="22"
replace subs_ibge = "1101" if subs_ibge=="25"
replace subs_ibge = "4517" if subs_ibge=="08"
replace subs_ibge = "4618" if subs_ibge=="14"
replace subs_ibge = "4516" if subs_ibge=="02"
replace subs_ibge = "4508" if subs_ibge=="05"
replace subs_ibge = "4514" if subs_ibge=="07"
replace subs_ibge = "4507" if subs_ibge=="09"
replace subs_ibge = "4515" if subs_ibge=="06"
replace subs_ibge = "4510" if subs_ibge=="04"
replace subs_ibge = "2202" if subs_ibge=="17"
replace subs_ibge = "4512" if subs_ibge=="10"
replace subs_ibge = "4511" if subs_ibge=="03"
replace subs_ibge = "4506" if subs_ibge=="13"
replace subs_ibge = "4513" if subs_ibge=="11"
replace subs_ibge = "5823" if subs_ibge=="18"
replace subs_ibge = "3304" if subs_ibge=="15"
replace subs_ibge = "5825" if subs_ibge=="20"
replace subs_ibge = "5820" if subs_ibge=="19"
replace subs_ibge = "5821" if subs_ibge=="21"
replace subs_ibge = "2203" if subs_ibge=="16"
replace subs_ibge = "5719" if subs_ibge=="24"
- Merge iotas and gammas onto some sort of longitudinal data, e.g. whatever we use for the regional earnings premia regressions.
- See what the match rate is.
- Start trying to run regressions