1、1 4We have learned two“sources”of endogeneity.1.Omitted variables 2.Errors in variables4In this handout,we will learn another source of endogeneity:Simultaneity.24In econometrics,“endogeneity”usually means that an explanatory variable is correlated with the error term.4In simultaneous equation model
2、s,endogeneity means that the observed variable is determined by the equilibrium.For example,an observed quantity is determined by the equilibrium between demand and supply.4When a variable is endogenous in simultaneous equation sense,it is usually endogenous in econometric sense(i.e.,correlated with
3、 the error term).We will see this soon.34Consider the following model describing equilibrium quantity of labor(in hours)in agricultural sector in a country.Labor supply:hs=1w+1z1+u1Labor demand:hd=2w+2z2+u2hs is the hours of labor supplied,and hd is the hours of labor demanded.These quantities depen
4、ds on the wage rate,w,and other factors,z1 and z2.44z1 would be the wage rate of the manufacturing sector.If the manufacturing wage increases,people would move to manufacturing sector,reducing hours worked in agricultural sector.z1 is called the observed demand shifter.u1 is called the unobserved de
5、mand shifter.4z2 would be agricultural land area.The more land available,more demand for labor.z2 is the observed supply shifter.u1 is the unobserved supply shifter.54Demand and supply describes entirely different relationships.4The observed labor quantity and wage rate are determined by the equilib
6、irum between these two equations.The equilibrium:hs=hd64Consider you have country level data.Then,for each country,we observe only the equilibirum labor supply and wage rate.Demand:hi=1wi+1zi1+ui1 Supply:hi=2wi+2zi2+ui2where i is the country subscript.These two equations constitute a simultaneous eq
7、uations model(SEM).These two equations are called the structural equations.1,1,2,2 are called the structural parameters.74In SEM framework,hi and wi are endogenous variables because they are determined by the equilibrium between the two equations.4In the same way,zi1 and zi2 are exogenous variables
8、because they are determined outside of the model.4u1 and u2 are called the structural errors.4One more important point:Without z1 or z2,there is no way to distinguish whether one equation is demand or supply.84Consider the following simultaneous equation model.y1=1y2+1z1+u1.(1)y2=2y1+2z2+u2.(2)In th
9、is model,y1 and y2 are endogenous variables since they are determined by the equilibrium between the two equations.z1 z2 are exogenous variables.94Since z1 and z2 are determined outside of the model,we assume that z1 and z2 are uncorrelated with both of the structural errors.4Thus,by definition,the
10、exgoneous variables in SEM are exogenous in econometric sense as well.4In addition,the two structural errors,u1&u2,are assumed to be uncorrelated with each other.104Now,solve the equations(1)and(2)for y1 and y2,then you get the following reduced form equations.y1=11z1+12z2+v1 y2=21z1+22z2+v2 where 1
11、1=1/(1-1 2)112=1 2/(1-1 2)v1=(u1+1 u2)/(1-1 2)21=21/(1-2 1)22=2/(1-2 1)v2=(2u1+u2)/(1-2 1)These parameters are called the reduced form parameters.114You can check that v1 and v2 are uncorrelated with z1 and z2.Therefore,you can estimate these reduced form parameters by OLS(Just apply OLS separately
12、for each equation).124However,you cannot estimate the structural equations with OLS.For example,consider the first structural equation.y1=1y2+1z1+u1Notice that Cov(y2,u1)=2/(1-21)E(u12)=2/(1-21)21 0Thus,y2 is correlated with u1(assuming that 2 0.)In other words,y2 is endogenous in econometric sense.
13、134Thus,endogenous variables in SEM are usually endogenous in econometric sense as well.4Thus,you cannot apply OLS to the structural equations.4Cov(y2,u1)=2/(1-21)21 can be used to predict the direction of bias.If this is positive,OLS estimate of 1 will be biased upward.If it is negative,it will be
14、biased downward.4The formula above does not carry over to more general models.But we can use this as a guide to check the direction of the bias.144Suppose that you are interested in estimating the effect of police size on the city murder rate.4Notice that the supply of murder would be a function of
15、police size.But the demand for police is a function of murder rates.154Thus,the observed murder rate and the police size are determined simultaneously by the following model.(Murder)=1(police)+10+1(Income per capita)+u1.(3)(Police)=2(Murder)+20+2(other vars)+u2.(4)Allthe variables are the city-level
16、 variables.(Murder)is the number of murders per capita.(Police)is the number of police officers per capita.We are interested in estimating the effect of police on the murder rate:equation(3).164However,since murder rate and police force are determined simultaneously,(police)is endogenous in equation
17、(3).Thus OLS estimate for 1 is biased.Question:What would be the direction of the bias?174When we learned OLS,a parameter was said to be identified when the explanatory variable is not correlated with the error.In 2SLS chapter,we learned how to identify(i.e.,eliminate the bias)by apply IV method.4In
18、 SEM,the term identification is used slightly differently.184Suppose the following model describing the supply and demand.Supply:q=1p+1z1+u1Demand:q=2p+u1Note that supply curve has an observed supply shifter z1,but demand has no obsedved supply shifter.Given the data on q,p and z1,which equation can
19、 be estimated?That is,which is an identified equation?1920DemandSupply:location is different depending on the value of z1.These are the data points.Notice:data points trace the demand curve.Thus,it is the demand equation that can be estimated.4Because there is observed supply shifter z1 which is not
20、 contained in demand equation,we can identify the demand equation.4It is the presence of an exogenous variable in the supply equation that allows us to estimate the demand equation.4In SEM,identification is used to mean which equation can be estimated.214Now turn to a more general case.(z11z1k)and(z
21、21 )may contain the same variables,but may contain different variables as well.When one equation contains exogenous variables not contained in the other equation,this means that we have imposed exclusion restrictions.22222212112202111111121101uzzyyuzzyyllkklz24The condition for identification is the
22、 following.The condition for identification:The first equation is identified if and only if the second equation contains at least one exogenous variable(non zero coefficient)that is excluded from the first equation.234The above condition have two components.First,at least one exogenous variable shou
23、ld be excluded from the first equation(order condition).Second,the excluded variable should have non zero coefficients in the second equation(rank condition).4The identification condition for the second equation is just a mirror image of the statement.244Labor supply of married working women.Labor s
24、upply equation:Wage offer equation:In the model,hours and lwage are endogenous variables.All other variables are exogenous.(Thus,we are ignoring the endogeneity of educ arising from omitted ability.)25114131211101)(6uomeNonWifeInckidsageeduclwagehours22232221302expexpueduchourslwage4Suppose that you
25、 are interested in estimating the first equation.4Since exp and exp2 are excluded from the first equation,the order condition is satisfied for the first equation.The rank condition is that,at least one of exp and exp2 has a non zero coefficient in the second equation.Assuming that the rank condition
26、 is satisfied,the first equation is identified.4In a similar way,you can see that the second equation is also identified.264Once we have determined that an equation is identified,we can estimate it by two stage least square.274Consider the labor supply equation example again.You are interested in es
27、timating the first equation.4Suppose that the first equation is identified(both order and rank conditions are satisfied).4lwage is correlated with u1.Thus,OLS cannot be used.28114131211101)(6uomeNonWifeInckidsageeduclwagehours22232221302expexpueduchourslwage4However,exp and exp2 can be used as instr
28、uments for lwage in the first equation.4Why?First,exp and exp2 are uncorrelated with u1 by assumption of the model(instrument exogeneity satisfied).Second exp and exp2 are correlated with lwage by the rank condition(instrument relevance satisfied).294In general,you can use the excluded exogenous var
29、iables as the instruments.304Consider the following simultaneous equation model.Q1:Which equation(s)is/are identified?Q2:Estimate the identified equation(s).31114131211101)(6uNonWifeInckidsageeduclwagehours22232221302expexpueduchourslwage32 _cons 1 15 52 23 3.7 77 75 5 3 30 09 9.4 42 22 26 6 4 4.9 9
30、2 2 0 0.0 00 00 0 9 91 15 5.5 57 73 34 4 2 21 13 31 1.9 97 76 6 nwifeinc -5 5.9 91 18 84 45 59 9 3 3.3 38 85 51 14 46 6 -1 1.7 75 5 0 0.0 08 81 1 -1 12 2.5 57 72 23 31 1 .7 73 35 53 38 89 93 3 kidslt6 -3 32 28 8.8 85 58 84 4 1 12 26 6.6 68 81 1 -2 2.6 60 0 0 0.0 01 10 0 -5 57 77 7.8 86 62 29 9 -7 79
31、 9.8 85 53 39 99 9 age .5 56 62 22 25 54 41 1 5 5.3 36 60 08 83 39 9 0 0.1 10 0 0 0.9 91 17 7 -9 9.9 97 75 50 01 19 9 1 11 1.0 09 99 95 53 3 educ -6 6.6 62 21 18 87 7 1 18 8.4 43 37 78 84 4 -0 0.3 36 6 0 0.7 72 20 0 -4 42 2.8 86 63 33 31 1 2 29 9.6 61 19 95 57 7 lwage -2 2.0 04 46 67 79 96 6 8 82 2.
32、0 02 22 27 75 5 -0 0.0 02 2 0 0.9 98 80 0 -1 16 63 3.2 27 70 08 8 1 15 59 9.1 17 77 72 2 hours Coef.Std.Err.t P|t|95%Conf.Interval Robust Root MSE =7 76 66 6.6 63 3 R-squared =0 0.0 03 36 61 1 Prob F =0 0.0 03 32 24 4 F(5,422)=2 2.4 46 6Linear regression Number of obs=4 42 28 8.reg hours lwage educ
33、age kidslt6 nwifeinc,robustInstruments:educ age kidslt6 nwifeinc exper expersqInstrumented:lwage _cons 2 22 22 25 5.6 66 62 2 6 60 03 3.0 09 96 64 4 3 3.6 69 9 0 0.0 00 00 0 1 10 04 43 3.6 61 15 5 3 34 40 07 7.7 70 09 9 nwifeinc -1 10 0.1 16 69 95 59 9 5 5.2 28 87 74 48 86 6 -1 1.9 92 2 0 0.0 05 54
34、4 -2 20 0.5 53 32 28 87 7 .1 19 93 36 69 91 11 1 kidslt6 -1 19 98 8.1 15 54 43 3 2 20 08 8.4 42 24 47 7 -0 0.9 95 5 0 0.3 34 42 2 -6 60 06 6.6 65 59 92 2 2 21 10 0.3 35 50 06 6 age -7 7.8 80 06 60 09 92 2 1 10 0.4 48 87 74 46 6 -0 0.7 74 4 0 0.4 45 57 7 -2 28 8.3 36 61 11 14 4 1 12 2.7 74 48 89 96 6
35、 educ -1 18 83 3.7 75 51 13 3 6 67 7.7 78 87 74 42 2 -2 2.7 71 1 0 0.0 00 07 7 -3 31 16 6.6 61 12 22 2 -5 50 0.8 89 90 03 39 9 lwage 1 16 63 39 9.5 55 56 6 5 59 93 3.3 31 10 08 8 2 2.7 76 6 0 0.0 00 06 6 4 47 76 6.6 68 87 79 9 2 28 80 02 2.4 42 23 3 hours Coef.Std.Err.z P|z|95%Conf.Interval Robust R
36、oot MSE =1 13 34 44 4.7 7 R-squared =.Prob chi2 =0 0.0 02 27 74 4 Wald chi2(5 5)=1 12 2.6 60 0Instrumental variables(2SLS)regression Number of obs=4 42 28 8.ivregress 2sls hours educ age kidslt6 nwifeinc(lwage=exper expersq),robustOLS2SLS33Instruments:educ exper expersq age kidslt6 nwifeincInstrumen
37、ted:hours _cons -.6 65 55 57 72 25 54 4 .4 40 09 97 76 65 55 5 -1 1.6 60 0 0 0.1 11 10 0 -1 1.4 45 58 88 85 51 1 .1 14 47 74 40 00 01 1 expersq -.0 00 00 07 70 05 58 8 .0 00 00 04 42 26 65 5 -1 1.6 65 5 0 0.0 09 98 8 -.0 00 01 15 54 41 18 8 .0 00 00 01 13 30 02 2 exper .0 03 34 45 58 82 24 4 .0 01 1
38、8 85 50 05 52 2 1 1.8 87 7 0 0.0 06 62 2 -.0 00 01 16 68 87 72 2 .0 07 70 08 85 51 19 9 educ .1 11 10 03 33 3 .0 01 14 48 81 17 78 8 7 7.4 45 5 0 0.0 00 00 0 .0 08 81 12 28 87 77 7 .1 13 39 93 37 72 23 3 hours .0 00 00 01 12 25 59 9 .0 00 00 02 29 92 24 4 0 0.4 43 3 0 0.6 66 67 7 -.0 00 00 04 44 47
39、72 2 .0 00 00 06 69 99 9 lwage Coef.Std.Err.z P|z|95%Conf.Interval Robust Root MSE =.6 67 75 54 45 5 R-squared =0 0.1 12 25 57 7 Prob chi2 =0 0.0 00 00 00 0 Wald chi2(4 4)=8 83 3.5 56 6Instrumental variables(2SLS)regression Number of obs=4 42 28 8.ivregress 2sls lwage(hours=age kidslt6 nwifeinc)educ
40、 exper expersq,robust _cons -.4 46 61 19 99 95 55 5 .2 21 11 13 34 44 49 9 -2 2.1 19 9 0 0.0 02 29 9 -.8 87 77 74 41 12 24 4 -.0 04 46 65 57 78 86 6 expersq -.0 00 00 08 85 58 85 5 .0 00 00 04 41 16 66 6 -2 2.0 06 6 0 0.0 04 40 0 -.0 00 01 16 67 77 73 3 -.0 00 00 00 03 39 97 7 exper .0 04 44 47 70 0
41、3 35 5 .0 01 15 52 25 50 03 3 2 2.9 93 3 0 0.0 00 04 4 .0 01 14 47 72 27 77 7 .0 07 74 46 67 79 93 3 educ .1 10 06 62 21 13 39 9 .0 01 13 33 32 26 69 9 7 7.9 97 7 0 0.0 00 00 0 .0 08 80 00 01 18 87 7 .1 13 32 24 40 09 91 1 hours -.0 00 00 00 05 56 65 5 .0 00 00 00 06 65 54 4 -0 0.8 86 6 0 0.3 38 88
42、8 -.0 00 00 01 18 85 52 2 .0 00 00 00 07 72 21 1 lwage Coef.Std.Err.t P|t|95%Conf.Interval Robust Root MSE =.6 66 65 59 9 R-squared =0 0.1 16 60 01 1 Prob F =0 0.0 00 00 00 0 F(4,423)=2 20 0.2 24 4Linear regression Number of obs=4 42 28 8.reg lwage hours educ exper expersq,robust4In the previous sli
43、des,the exogenous variables excluded from the equation were called the instruments.4In SEM(and in usual IV method too),people often refer to all the exogenous variables(regardless of whether they are included or excluded)as the instruments.The instruments that are excluded from the equation is calle
44、d specifically as the excluded instruments.344Consider the following SEM.4The notation is a short hand notation for .The same for .4Due to the fixed effect term and ,z-variables are correlated with the composite error terms.Therefore,the excluded exogenous variables cannot be used as instruments unl
45、ess we do something.35)()(22221221111211itiititititiititituazyyuazyy11itz22itz1ia2iatkktzz1111114To apply 2SLS,we should first(i)first-difference,or(i)demean the equations.First-differenced version Time demeaned(fixed effect)version36222122111211itititititititituzyyuzyy222122111211itititititititituz
46、yyuzyy 4Then or are not correlated with the error term.Thus we can apply the 2SLS method.4Estimation procedure is the same.First,determine which equation is identified.Then,use the excluded exogenous variable as the instruments in the 2SLS method.3721,ititzz11,itityz 4The effect of prison population
47、 on the violent crime rate(Levitte 1996).4This paper answers to the following question:To what extent an increase in prison population would decrease the violent crime?384Consider the following model.(Crime):the number of violent crimes per capita.(Prison)prison population per capita.:intercepts(dif
48、ferent at each year:just include year dummies.)z1:police per capita,log of income per capita,unemployment rate,proportions of black and those living in metropolitan areas,and age distributions.39)4).()log()log(11111itiittituazprisoncrimet4First-differece the equation to eliminate the fixed effect ai
49、.4Even after eliminating the fixed effect,there still is the simultaneous equation bias,because the prison population is determined by the crime rate as well.40)5.()log()log(1111itittituzprisoncrime4The simultaneity can be expressed in the SEM framework as:(Exogenous vars)in equation(7)could contain
50、 .However,in order to identify the crime equation(6),(exogenous vars)should contain variables that are not included the crime equation.What can be the variable?41)7.(.Vars)(Exogenous)log()log(22it22ittitucrimeprison)6.()log()log(11111ititittituzprisoncrime1itz4Levitte(1996)used the overcrowding liti