异方差问题检验与修正课件.ppt_163文库

资源描述

1、第9章异方差：检验与修正Heteroskedasticity:test and correctionContents Whats heteroskedasticity?Why worry about heteroskedasticity?How to test the heteroskedasticity?Corrections for heteroskedasticity?Whats heteroskedasticity?Recall the assumption of homoskedasticity implied that conditional on the explanatory

2、 variables,the variance of the unobserved error,u,was constantvar(u|X)=s2(homoskedasticity)If this is not true,that is if the variance of u is different for different values of the Xs,then the errors are heteroskedasticvar(ui|Xi)=si2(heteroskedasticity).X1X2E(Y|X)=b0+b1XYf(Y|X)homoskedasticity.X3Exa

3、mple of Heteroskedasticity.X X1X2Yf(Y|X)X3.E(Y|X)=b0+b1XGenerally,cross-section data more easily induce heteroskedasticity because of different characteristics of different individuals.Consider a cross-section study of family income and expenditures.It seems plausible to expect that low income indiv

4、iduals would spend at a rather steady rate,while the spending patterns of high income families would be relatively volatile.If we examine sales of a cross section of firms in one industry,error terms associated with very large firms might have larger variances than those error terms associated with

5、smaller firms;sales of larger firms might be more volatile than sales of smaller firms.XYhomoskedasticity XYIncreasing with XXYComplicated heteroskedasticity YDecreasing with Xindindsalessalesrdexprdexpprofitprofitindindsalessalesrdexprdexpprofitprofitpackingpacking6375.36375.362.562.5185.1185.1nurs

6、enurse80552.880552.86620.16620.113869.913869.9nonbanknonbank11626.411626.492.992.91569.51569.5spacespace95294952943918.63918.64487.84487.8serviceservice14655.114655.1178.3178.3274.8274.8consumptionconsumption101314.1101314.11595.31595.310278.910278.9metalmetal21896.221896.2258.4258.42828.12828.1elec

7、tronicselectronics116141.3116141.36107.56107.58787.38787.3househouse26408.326408.3494.7494.7225.9225.9chemistrychemistry122315.7122315.74454.14454.116438.816438.8manufacturemanufacture32405.632405.6108310833751.93751.9polymerpolymer141649.9141649.93163.83163.89761.49761.4leisureleisure35107.735107.7

8、1620.61620.62884.12884.1computercomputer175025.8175025.813210.713210.719774.519774.5paperpaper40295.440295.4421.7421.74645.74645.7fuelfuel230614.5230614.51703.81703.822626.622626.6foodfood70761.670761.6509.2509.25036.45036.4autoauto2935432935439528.29528.218415.418415.4050001000015000R&D expenditure

9、(million dollars)0100000200000300000sales(million dollars)050001000015000R&D expenditure(million dollars)/Fitted values0100000200000300000sales(million dollars)Why Worry About Heteroskedasticity?The consequences of heteroskedasticity OLS estimates are still unbiased and consistent,even if we do not

10、assume homoskedasticity.take the simple regression as an example Y=b0+b1 X+uWe know the OLS estimator of b1 is 11221112iiiiiiiiiXX YXX uXXXXXX uEEXXbbbbb+The consequences of heteroskedasticity,cont.The R2 and adj-R2 are unaffected by heteroskedasticity.Because RSS and TSS are not affected by heteros

11、kedasticity,our R2 and adj-R2 are also not affected by heteroskedasticity.221111ESSRSSRTSSTSSRSSnkRTSSn The consequences of heteroskedasticity,cont.The standard errors of the estimates are biased if we have heteroskedasticity211112222222122var,varvarBecause of heteroskedasticity,then var,which are n

12、ot constant,therefore,var.However,OLS esiiiiiiiiiiiiiiXX uXX uXXuXXXXXXuXXXXbbbbssb+212timate of the variance of is.So,in this case,OLS estimates of the variances of the partial coefficients are biased.iXXsbThe consequences of heteroskedasticity,cont.The OLS estimates arent efficient,thats the varia

13、nces of the estimates are not the smallest variances.If the standard errors are biased,we can not use the usual t statistics or F statistics for drawing inferences.That is,the t test and F test and the confidence interval based on these test dont work.In a word,when there exists heteroskedasticity,w

14、e can not use t test and F test as usual.Or else,well get the misleading result.Summary of the consequences of heteroskedasticity OLS estimates are still unbiased and consistent The R2 and adj-R2 are unaffected by heteroskedasticity The standard errors of the estimates are biased.The OLS estimates a

15、rent efficient.Then,the t test and F test and the confidence interval dont work.How to test the heteroskedasticity?Residual plot w In the OLS estimation,we often use the residual ei to estimate the random error term ui,therefore,we can test whether there is heteroskedasticity of ui by examine ei.We

16、plot the scatter graph between ei2 and X.Residual plot,cont.Xe2a)homoskedasticity Xe2b)Xe2c)Xe2d)Xe2e)Residual plot,cont.w If there are more than one independent variables,we should plot the residual squared with all the independent variables,separately.w There is a shortcut to do the residual plot

17、test when there are more than 1 independent variables.That is,we plot the residual with the fitted value,because is just the linear combination of all Xs.Residual plot:example 9.2-50000500010000Residuals/Fitted values0100000200000300000sales(million dollars)02.00e+074.00e+076.00e+07e2010000020000030

18、0000sales(million dollars)Park testuIf there exists heteroskedasticity,then the variance of error term ui,si2 may be correlated with some of the independent variables.Therefore,we can test whether si2 is correlated with any of the explanatory variables.If they are related,then there exists heteroske

19、dasticity,on the contrary,theres no heteroskedasticity.uFor example,for the simple regression model nln(si2)=b0+b1 ln(Xi)+viProcedure of Park testuRegress dependent variable(Y)on independent variables(Xs),first.uGet the residual of the first regression,ei and ei2.uThen,take ln(ei2)as dependent varia

20、ble,the original independent variables logged as explanatory variables,make a new regression.uln(ei2)=b0+b1 ln(Xi)+viuThen test H0:b1=0 against H1:b1 0.uIf we can not reject the null hypothesis,then that prove there is no heteroskedasticity,thats,homoskedasticity.Park test:ExampleuLet take example 9

21、.2 as exampleuFirst,regress R&D expenditure(rdexp)on sales(sales),we getnrdexp=192.91+0.0319 salesnSe=(991.01)(0.0083)nN=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67uSecond,get the residuals(ei)of the regressionuThird,regress ln(ei2)on ln(sales),we getnln(ei2)=1.216 ln(sales)nSe =(0.057)np =(0.000)R2=0.

22、9637 Adj-R2=0.9615uFinally,we test whether the slope of the second regression equal zero.From the p-value of the parameter,given 5%significant level,we will can reject the null hypothesis.Therefore,there exist heteroskedasticity in the first regression.uNote:Park test is not a good test for heterosk

23、edeasticity because of his special specification of the auxiliary regression,which may be heteroskedastic.The essence of Glejser test is same to Park test.But,Glejser suggest we can use the following regression to detect the heteroskedasticity of u.|ei|=b0+b1 Xi+vi|ei|=b0+b1 Xi+vi|ei|=b0+b1 1/Xi)+vi

24、Still,we just test H0:b1=0 against H1:b1 0.If we can reject the null hypothesis,then that prove there is heteroskedasticity.On the contrary,its homoskedasticity.First,regress R&D expenditure(rdexp)on sales(sales),we getrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=

25、14.67Second,get the residuals(ei)of the regressionThird,regress|ei|on 1/sales,we get|ei|=2273.651992500 1/sales)se=(604.69)(12300000)p =(0.002)(0.125)Finally,test whether the slope is zero.From the p-value of the slope,we can see it larger than 5%of significance level.We can not reject the null hypo

26、thesis,that means there doesnt exist heteroskedasticity.The White Test The White test is more general test,which allows for nonlinearities by using squares and crossproducts of all the Xs,ie.,k=3Y=b0+b1X1+b2X2+b3X3+ue2=d0+d1 X1+d2X2+d3 X3+d4 X12+d5X22+d6X32+d7X1X2+d8X1X3+d9X2X3+v Using an F or LM to

27、 test whether all the Xj,Xj2,and XjXh are jointly significant,that is,to test H0:d1d2d90 against H1:H0 is not true.If we can reject H0,that means there exists heteroskedasticity.The White Test To test H0:d1d2d90,we can use F test learned in chapter 4.Let R2 stands for the goodness of fit from the au

28、xiliary regression.F=R2/k/(1 R2)/(n k 1)We also can use LM test.LM=nR2c2k,n is number of obs.k is the number of restrictions.The White Test:Example 9.2 First,regress R&D expenditure(rdexp)on sales(sales)and profits(profits),we get rdexp=-13.93+0.0126 sales+0.2398profits se =(991.997)(0.018)(0.1986)p

29、 =(0.989)(0.496)(0.246)n=18 R2=0.5245 Adj-R2=0.4611 F=8.27 Second,we get the residuals e from the regression above.Third,regress e2 on sales,profits,sales2,profits2,and salesprofits.e2=693735.5+135.00sales-1965.7profits-0.0027sales2-0.116 profits2+0.050salesprofits N=18 R2=0.8900 F(5,12)=19.42 Prob

30、F=0.0000 Finally,test H0:d1d2d3d4 d50,The p-value of the F test is 0.0000,so we can reject H0.LM=nR2=180.89=16.02 c20.05 511.07,also reject H0.So,there exists heteroskedasticity in the first regression.Alternate form of the White testw This can get to be unwieldy pretty quicklyw Consider that the fi

31、tted values from OLS,are a function of all the Xsw Thus,2 will be a function of the squares and crossproducts and and 2 can proxy for all of the Xj,Xj2,and XjXh,so w Regress the residuals squared on and 2 and use the R2 to form an F or LM statisticw Note only testing for 2 restrictions nowThe proced

32、ure of the special case of white testw regress Y on X1,X2,Xk.We get the residual eiw Calculate,2(predict ybar,xb.Gen ybarsq=ybar2)w regress e2 on,2.And test the joint zero hypotheses of the regressorsw Use F statistic or LM test to test the null hypothesis of homoskedasiticity.Example:white test in

33、wage determination equationw First,using OLS estimate the model without considering heteroskedasticitywge=-2.87+0.599educ+0.022exper+0.139tenurew Calculate the residuals of regression,ei and the fitted value of wage,wge.Therefore,the value of ei2,wge2.w Regress ei2 on wge,wge2,we getei2=7.36 2.86 wg

34、e+0.49 wge2se=(5.62)(1.76)(0.125)n=526 R2=0.0984 F=28.55 ProbF=0.000w Test Ho:d1d2 0,F test,F=28.55 ProbF=0.000 5.99=c20.052,reject H0.050100150200e2-5051015Fitted valuesCorrections for HeteroskedasticityCorrections for Heteroskedasticity Known variances,Var(ui|X)=si2 The original model isYi=b0+b1Xi

35、1+bkXik+uiTwo sides divided by si at the same time The new disturbance isui*=ui/si,then var(ui*)=var(ui/si)=var(ui)/si2=1 So the new modelYi/si=b0/si+b1Xi1/si+bkXik/si+ui/si,that is,Y*=b0*+b1X1*+bkXk*+u*We can estimate the new model with OLS,this is called WLSBut,usually,we dont know the variances.C

36、ase of form being known up to a multiplicative constant Suppose the heteroskedasticity can be modeled as Var(u|X)=s2h(X),where the trick is to figure out what h(X)hi looks like E(ui/hi|X)=0,because hi is only a function of X,and Var(ui/hi|X)=s2,because we know Var(u|X)=s2hi So,if we divided our whol

37、e equation by hi we would have a model where the error is homoskedastic Case 1:h(X)=X The simple regression modelYi=b0+b1Xi+ui We know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2Xi,Then,we divide the original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/X

38、i,rewrite it asYi/Xi=b0/Xi+b1Xi+vi (*)Var(vi)=var(ui/Xi)=var(ui)/Xi=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.Xe2 Example 9.6(textbook2e,p233)We have proved that there exist heteroskedasticity in the model of R&D expenditure determination model.Now,we assume t

39、he variance of the error term change with independent variable sales,that is,var(ui)=s2salesi The original model isrdexpi=b0+b1salesi+ui The transformed model isrdexpi/salesi=b0 1/salesi)+b1 salesi+vi,Where,vi=ui/salesiExample 9.6(textbook2e,p233)Estimate of the transformed model isrdexp/sales=246.7

40、3 1/sales)+0.0368 salesrdexp=246.73 +0.0368salesse =(381.16)(0.0071)t =(-0.65)(5.17)n=18 R2=0.6923 Adj-R2=0.6538 F=18.00 WLS command:reg rdexp sales aweight=1/sales Estimate of the original model isrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)t =(0.19)(3.83)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67

41、Compare the result of the two estimation,what do you find?Case 2:h(X)=X2 The simple regression modelYi=b0+b1Xi+ui We know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2Xi2,Then,we divide the original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/Xi,rewrite it

42、 asYi/Xi=b0/Xi+b1+vi (*)Var(vi)=var(ui/Xi)=var(ui)/Xi2=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.Xe2Generalized Least Squares Estimating the transformed equation by OLS is an example of generalized least squares(GLS)GLS will be BLUE in this case,(because the t

43、ransformed equation will meet the Gauss-Markov assumption)GLS is a weighted least squares(WLS)procedure where each squared residual is weighted by the inverse of Var(ui|xi)2*0011121011200111The sum of squared residuals in the transformed variables are1niiikikiniiikkiiiiiniiikikiiyxxxyxxhhhhyxxxhbbbb

44、bbbbbMore on WLS,01,2,3,Lets consider the wage determination,where,denote a particular firm and let denote an employee with in the firm.Assume the above equation sai ei ei ei ei ewageeducagetenureuiebbbb+tisfies the Gauss-Markovassumptions,then we can estimate it,given a sample onindividuals across

45、various firms.But,we only have the average values of wages,education,age,tenure by firm.That is,individual level data are not available.Thus,let,denote averagewages,average educations,average age,and average tenurefor the people at firm,separately.Then the oiiiiwage educ age tenurei0123riginal equat

46、ioncan be transfromed to iiiiiwageeduagetenureubbbb+More on WLS,cont.2,If the original equation at the individual level satisfies the homoskedasticity assumption,then the firm-level equation the transformed equation must be heteroskedasticity.if var for all andi euis 2,then var/,where is the number

47、of employees in firm.1In this case,the most efficient procedure is WLS,withweights equal to the number of employees at the firm 1/.Thisiiiiiiieummihmhms ensures that larger firms receive more weight.This givesus an efficient way of estimation the parameters in the individuallevel model when we only

48、have averages at the firm level.More on WLS,cont.A similar weighting arises when we are using per capita data at the city,country,state,or country level.If the individual-level equation satisfies the Guass-Markov assumptions,then the error in per captia equation has a variance proportional to one ov

49、er the size of the population.Therefore,weighted least squares with weights equal to the population is appropriate.Summary of WLS WLS is great if we know what Var(ui|xi)looks like In most cases,wont know form of heteroskedasticity Example where do is if data is aggregated,but model is individual lev

50、el Want to weight each aggregate observation by the inverse of the number of individualsFeasible GLS More typical is the case where you dont know the form of the heteroskedasticity.In this case,you need to estimate h(xi)Typically,we start with the assumption of a fairly flexible model,such asVar(u|x

展开阅读全文