1、1第4章多元回归分析:估计与假设检验 Multiple Regression Analysisy=b0+b1x1+b2x2+.bkxk+uEstimation and Inference2Parallels with Simple Regression Yi=b0+b1Xi1+b2Xi2+.bkXik+uib0 is still the interceptb1 to bk all called slope parameters,also called partial regression coefficients and any coefficient bj denote the change
2、 of Y with the changes of Xj as all the other independent variables fixed.u is still the error term(or disturbance)Still minimizing the sum of squared residuals,so have k+1 first order conditions3Obtaining OLS Estimates0101120111In the general case with k independent variables,we seek estimates,in t
3、he equitiontherefore,minimize the sum of squared residuals:kkkniikikiYXXYXXbbbbbbbbbfrom the first order condition,we can get k1 linear equations in k1 unknowns b0,b1,bk:Yib0b1Xi1bkXik0i1nXi1Yib0b1Xi1bkXik0i1nXi2Xib0b1Xi1bkXik0i1nXikXib0b1Xi1bkXik0i1n.4Obtaining OLS Estimates,cont.The above estimate
4、d equation is called the OLS regression line or the sample regression function(SRF)the above equation is the estimated equation,is not the really equation.The really equation is population regression line which we dont know.We only estimate it.So,using a different sample,we can get another different
5、 estimated equation line.The population regression line is01122kkYXXXbbbb01122(|)kkE Y XXXXbbbb5Interpreting Multiple Regressiontioninterpreta a has each is that,thatimplies fixed.,holding so,.so,.112221122110ribus ceteris paXYX,XXXXYXXXYkkkkkbbbbbbbbb6An Example(Wooldridge,p76)The determination of
6、wage(dollars per hour),wage:Years of education,educYears of labor market experience,experYears with the current employer,tenure The relationship btw.wage and educ,exper,tenure:wage=b0b1educ+b2exper+b3tenure+u The estimated equation as below:wage=2.8730.599educ+0.022exper+0.169tenure7A“Partialling Ou
7、t”Interpretation822011211122110 regression estimated thefrom residuals theare where,then,i.e.,2 wherecase heConsider tXXrrYrXXYkiiiibbbb9Previous equation implies that regressing Y on X1 and X2 gives same effect of X1 as regressing Y on residuals from a regression of X1 on X2This means only the part
8、 of Xi1 that is uncorrelated with Xi2 are being related to Yi,so were estimating the effect of X1 on Y after X2 has been“partialled out”10The estimated equation as below:wage=2.873+0.599educ+0.022exper+0.169tenureNow,we first regress educ on exper and tenure to patial out the exper and tenures effec
9、ts.Then we regress wage on the residuals of educ on exper and tenure.Whether we get the same result.?educ=13.575-0.0738exper+0.048tenure denote residuals residwage=5.896+0.599residWe can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equat
10、ion.So is in the second equation.11Goodness-of-Fit:R212Goodness-of-FitWe can think of each observation as being madeup of an explained part,and an unexplained part,YiYiei We then define the following:YiY2 is the total sum of squares(TSS)YiY2 is the explained sum of squares(ESS)ei2 is the residual su
11、m of squares(RSS)Then TSS ESS RSS13Goodness-of-Fit(continued)How do we think about how well our sample regression line fits our sample data?Can compute the fraction of the total sum of squares(SST)that is explained by the model,call this the R-squared of regression R2=ESS/TSS=1 RSS/TSS1422222 values
12、 theand actual thebetweent coefficienn correlatio squared the toequal being as of think alsocan WeiiiiiiiiYYYYYYYYRYYR15wR2 can never decrease when another independent variable is added to a regression,and usually will increasewBecause R2 will usually increase with the number of independent variable
13、s,it is not a good way to compare models16wUsing wage determination model to show that when we add another new independent variable will increase the value of R2.17wR2 is simply an estimate of how much variation in y is explained by X1,X2,Xk.That is,wRecall that the R2 will always increase as more v
14、ariables are added to the modelwThe adjusted R2 takes into account the number of variables in a model,and may decrease1111So,111112222knnRRnTSSnTSSknRSSR22211yunTSSnRSSR18wMost packages will give you both R2 and adj-R2w You can compare the fit of 2 models(with the same Y)by comparing the adj-R2awge=
15、-3.391+0.644educ+0.070exper adj-R2=0.2222awge=-2.222+0.569educ+0.190tenure adj-R2=0.2992w You cannot use the adj-R2 to compare models with different ys(e.g.y vs.ln(Y)awge=-3.391+0.644educ+0.070exper adj-R2=0.2222alog(wge)=0.404+0.087educ+0.026exper adj-R2=0.3059aBecause the variance of the dependent
16、 variables is different,the comparation btw them make no sense.19Assumptions for Unbiasedness20Assumptions for UnbiasednessPopulation model is linear in parameters:Y=b0+b1X1+b2X2+bkXk+uWe can use a sample of size n,(Xi1,Xi2,Xik,Yi):i=1,2,n,from the population model,so that the sample model is Yi=b0+
17、b1Xi1+b2Xi2+bkXik+ui Cov(uXi)=0,E(uXi)=0,i=1,2,n.E(u|X1,X2,Xk)=0,implying that all of the explanatory variables are exogenous.E(u|X)=0,where X=(X1,X2,Xk),which will reduce to E(u)=0 if independent variables X are not random variables.None of the Xs is constant,and there are no exact linear relations
18、hips among them.The new additional assumption.21About multicollinearity It does allow the independent variables to be correlated;they just cannot be perfectly linear correlated.Student performance:colGPA=b0b1 hsGPA+b2ACT+b3 skipped+uConsumption function:consum=b0b1inc+b2inc2+uBut,the following is in
19、valid:log(consum)=b0b1log(inc)+b2log(inc2)+uIn this case,we can not estimate the regression coefficients b1,b2.22Unbiasedness of OLS estimationUnder the three assumptions above,we can get,0,1,jjEjkbb112211110112211111we prove the result only for:the proof for the other parameters is virtually identi
20、cal.we first write the as following:nnnniiiiiikikiiiiiir yrrxxxurbbbbbbbb2011112121111111112221111111111111111111121111 nnnnnniiiiiiiikiiikiiiiiiiiiinnnnnnniiiiiiiiiiiiiiiiiiiniiiiirr xr xr xr urrxrr urrr xr urr urbbbbbbb 122111111111111therefore,()nnnnniiiiiiiiiiEEr urr E urbbbbjb jjEbb23Too Many o
21、r Too Few Variables24What happens if we include variables in our specification that dont belong?There is no effect on our parameter estimate,and OLS remains unbiasedWhat if we exclude a variable from our specification that does belong?OLS will usually be biased 0112233312123120suppose we specify the
22、 model as,and this model satisfies the three assumptions.but the has no effect on after control,that is,the really model is|,|,YXXXuXYXXE Y XXXE Y XXbbbbb112230112233the estimated model including isthe estimated parameters is unbiased,there is no effect.XXXYXXXbbbbbb2521111111022110 then,estimatebut
23、 we,asgiven is model true theSupposeXXYXXXYuXXYiiibbbbbb2621111211211212111121122111211221101121111122110be willestimate theso,thatso model,true theRecallXXuXXXXXXXXXuXXxXXXXXXuXXXXXXYXXuXXYiiiiiiiiiiiiiiiiiiiiiiiibbbbbbbbbbb27 21121121121111211211211have wensexpectatio taking0,)E(sinceXXXXXEuXXuXXX
24、XXXXiiiiiiiiiibbbbbb28There are two cases where the estimated parameter is unbiased:If b2=0,so that X2 does not appear in the true modelIf tilde of d1=0,the tilde b1 is unbiased for b1 12112112111110212 so then on of regression heConsider tdbbbdddEXXXXXXXXXiii29Corr(X1,X2)0Corr(X1,X2)0Positive biasN
25、egative biasb2 0 and H1:bj 0One-Sided Alternatives(cont)0ca1 aFail to rejectreject abbabbbb1orcsePctPcsePctPjjjjjj58wWage determination:(wooldridge,p123)wlog(wge)=0.284+0.092educ+0.0041exper+0.022tenurew (0.104)(0.007)(0.0017)(0.003)w n=526 R2=0.316wWhether the return to exper,controlling for educ a
26、nd tenure,is zero in the population,against the alternative that it is positive.wH0:bexper=0 vs.H1:bexper 0wThe t statistic is t=0.0041/0.00172.41wThe degree of freedom:df=n-k-1=526-3-1=522wThe critical value of 5%is 1.645wAnd the t statistic is larger than the critical value,ie.,2.411.645wThat is,w
27、e will reject the null hypothesis and bexper is really positive.01.6451 aFail to reject5%reject59wWhether the school size has effect on student performance?amath10,math test scores,reveal the student performanceatotcomp,average annual teacher compensationastaff,the number of staff per one thousand s
28、tudentsaenroll,student enrollment,reveal the school size.wThe Model Equationamath10=b0+b1totcomp+b2staff+b3enroll+uaH0:b3=0,H1:b3-1.645,so we cant reject the null hypothesis.-1.645reject-09160wBecause the t distribution is symmetric,testing H1:bj 0 is straightforward.The critical value is just the n
29、egative of beforewWe can reject the null if the t statistic c,then we fail to reject the nullwFor a two-sided test,we set the critical value based on a/2 and reject H0:bj=0 if the absolute value of the t statistic c61yi =b0 +b1Xi1 +bkXik+uiH0:bj=0 H1:bj 0c0a/21 a-ca/2Two-Sided Alternativesrejectreje
30、ctfail to reject abbabbbb1orcsePctPcsePctPjjjjjj62wUnless otherwise stated,the alternative is assumed to be two-sidedwIf we reject the null,we typically say“Xj is statistically significant at the 100a%level”wIf we fail to reject the null,we typically say“Xj is statistically insignificant at the 100a
31、%level”63wVariables:acolGPA,college GPAaskipped,the average number of lectures missed per weekaACT,achievement test scoreahsGPA,high school GPAwThe estimated modelaolGPA=1.39+0.412 hsGPA+0.015 ACT 0.083 skippeda (0.33)(0.094)(0.011)(0.026)a n=141,R2=0.234wH0:bskipped=0,H1:bskipped 0wfd:n-k-1=137,the
32、 critical value t137=1.96wThe t statistic is|-0.083/0.026|=3.19 t137=1.96,so we will reject the null hypothesis and the bskipped is signanificantly beyond zero.-1.96reject-3.191.96reject64wA more general form of the t statistic recognizes that we may want to test something like H0:bj=aj wIn this cas
33、e,the appropriate t statistic is teststandard for the 0 where,jjjjaseatbb65lVariableslcrime,the annual number of crimes on college campuseslenroll,student enrollment,reveal the size of college.lThe regression modelllog(crime)=b0+b1log(enroll)+ulWhether b1=1,that is H0:b1=1,H1:b1 1llog(crime)=6.63+1.
34、27 log(enroll)l (1.03)(0.11)n=97 R2=0.585ldf:n-k-1=95,the critical value at 5%is t95=1.645lThe t-statistic is(1.27-1)/0.112.45t95=1.645lSo we reject the null hypothesis and the evidence prove that b1 1.66wAnother way to use classical statistical testing is to construct a confidence interval using th
35、e same critical value as was used for a two-sided testw A 100(1-a)%confidence interval is defined as on.distributi ain percentile 2-1 theis c where,1knjjtsecabb67wAn alternative to the classical approach is to ask,“what is the smallest significance level at which the null would be rejected?”wSo,comp
36、ute the t statistic,and then look up what percentile it is in the appropriate t distribution this is the p-valuewp-value is the probability we would observe the t statistic we did,if the null were true68wMost computer packages will compute the p-value for you,assuming a two-sided testwIf you really
37、want a one-sided alternative,just divide the two-sided p-value by 2wStata provides the t statistic,p-value,and 95%confidence interval for H0:bj=0 for you,in columns labeled“t”,“P|t|”and“95%Conf.Interval”,respectively69Testing a Linear Combination Suppose instead of testing whether b1 is equal to a c
38、onstant,you want to test if it is equal to another parameter,that is H0:b1=b2,or b1-b2=0 Use same basic procedure for forming a t statistic2121bbbbset70Testing Linear Combination(cont)211221122221212121212121,of estimatean is where2,2 then,SincebbbbbbbbbbbbbbbbCovssseseseCovVarVarVarVarse71Testing a
39、 Linear Combo(cont)So,to use formula,need s12,which standard output does not have Many packages will have an option to get it,or will just perform the test for you In Stata,after reg Y X1 X2 Xk you would type test X1=X2 to get a p-value for the test More generally,you can always restate the problem
40、to get the test you want72Example:Suppose you are interested in the effect of campaign expenditures on outcomes Model is voteA=b0+b1log(expendA)+b2log(expendB)+b3prtystrA+u H0:b1=-b2,or H0:q1=b1+b2=0b1=q1 b2,so substitute in and rearrange voteA=b0+q1log(expendA)+b2log(expendB)log(expendA)+b3prtystrA
41、+u73Example(cont):This is the same model as originally,but now you get a standard error for b1 b2=q1 directly from the basic regression Any linear combination of parameters could be tested in a similar manner Other examples of hypotheses about a single linear combination of parameters:b1=1+b2;b1=5b2
42、;b1=-1/2b2;etc 74Multiple Linear Restrictions Everything weve done so far has involved testing a single linear restriction,(e.g.b1=0 or b1=b2)However,we may want to jointly test multiple hypotheses about our parameters A typical example is testing“exclusion restrictions”we want to know if a group of
43、 parameters are all equal to zero75Now the null hypothesis might be something like H0:bk-q+1=0,.,bk=0The alternative is just H1:H0 is not trueCant just check each t statistic separately,because we want to know if the q parameters are jointly significant at a given level it is possible for none to be
44、 individually significant at that level76To do the test we need to estimate the“restricted model”without Xk-q+1,Xk included,as well as the“unrestricted model”with all Xs includedIntuitively,we want to know if the change in RSS is big enough to warrant inclusion of Xk-q+1,Xk edunrestrict is and restr
45、icted is where,1urrknRSSqRSSRSSFururr77The F statistic is always positive,since the RSS from the restricted model cant be less than the RSS from the unrestricted.Essentially the F statistic is measuring the relative increase in RSS when moving from the unrestricted to restricted model q=number of re
46、strictions,or dfr dfur n k 1=dfur78To decide if the increase in RSS when we move to a restricted model is“big enough”to reject the exclusions,we need to know about the sampling distribution of our F statNot surprisingly,F Fq,n-k-1,where q is referred to as the numerator degrees of freedom and n k 1
47、as the denominator degrees of freedom 790c1 af(F)FThe F statistic(cont)rejectfail to rejectReject H0 at a significance level if F ca80The regression modellog(salary)=b0+b1year+b2gamesyr+b3bavg+b4hrunsyr+b5rbisyr+usalary,the 1993 total salaryyears,years in the leaguegamesyr,average games played per y
48、earbavg,career batting averagehrunsyr,home runs per yearrbisyr,runs battled in per yearThe null hypothesis is H0:b30,b40,b50,which is called multiple hypotheses test or joint hypotheses test.The alternative hypothesis is H1:H0 is not true.The unrestricted model:log(salary)=11.19+0.0689year+0.0126gam
49、esyr+0.00098bavg+0.0144hrunsyr+0.0108rbisyr (0.29)(0.0689)(0.0026)(0.00110)(0.0161)(0.0072)n=353,SSR=183.186,R2=0.6278The restricted modellog(salary)=11.22+0.0713year+0.0202gamesyr (0.11)(0.0125)(0.0013)n=353,SSR=198.311,R2=0.597181The restricted number and the degree of the freedom of restricted mo
50、del is q=3;The degree of freedom of unrestricted model is 353-5-1=347;Then the F statistic ishypothesis null reject thecan we53.855.9 and.53.8is 5%at 347 and 3 freedom of valuecritical theand55.9347/186.1863/186.186-198.3111347,3FknRSSqRSSRSSFururr82Because the RSSs may be large and unwieldy,an alte