1、计量经济学导论第四版英文计量经济学导论第四版英文 完整教学课件完整教学课件 Economics 20 - Prof. Anderson 1 Economics 20 - Prof. Anderson 2 Welcome to Economics 20 What is Econometrics? Economics 20 - Prof. Anderson 3 Why study Econometrics? Rare in economics (and many other areas without labs!) to have experimental data Need to use non
2、experimental, or observational, data to make inferences Important to be able to apply economic theory to real world data Economics 20 - Prof. Anderson 4 Why study Econometrics? An empirical analysis uses data to test a theory or to estimate a relationship A formal economic model can be tested Theory
3、 may be ambiguous as to the effect of some policy change can use econometrics to evaluate the program Economics 20 - Prof. Anderson 5 Types of Data Cross Sectional Cross-sectional data is a random sample Each observation is a new individual, firm, etc. with information at a point in time If the data
4、 is not a random sample, we have a sample-selection problem Economics 20 - Prof. Anderson 6 Types of Data Panel Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences. Can follow the same random individual observations over time kno
5、wn as panel data or longitudinal data Economics 20 - Prof. Anderson 7 Types of Data Time Series Time series data has a separate observation for each time period e.g. stock prices Since not a random sample, different problems to consider Trends and seasonality will be important Economics 20 - Prof. A
6、nderson 8 The Question of Causality Simply establishing a relationship between variables is rarely sufficient Want to the effect to be considered causal If weve truly controlled for enough other variables, then the estimated ceteris paribus effect can often be considered to be causal Can be difficul
7、t to establish causality Economics 20 - Prof. Anderson 9 Example: Returns to Education A model of human capital investment implies getting more education should lead to higher earnings In the simplest case, this implies an equation like ueducationEarnings 10 Economics 20 - Prof. Anderson 10 Example:
8、 (continued) The estimate of 1, is the return to education, but can it be considered causal? While the error term, u, includes other factors affecting earnings, want to control for as much as possible Some things are still unobserved, which can be problematic Economics 20 - Prof. Anderson 11 The Sim
9、ple Regression Model y = 0 + 1x + u Economics 20 - Prof. Anderson 12 Some Terminology In the simple linear regression model, where y = 0 + 1x + u, we typically refer to y as the Dependent Variable, or Left-Hand Side Variable, or Explained Variable, or Regressand Economics 20 - Prof. Anderson 13 Some
10、 Terminology, cont. In the simple linear regression of y on x, we typically refer to x as the Independent Variable, or Right-Hand Side Variable, or Explanatory Variable, or Regressor, or Covariate, or Control Variables Economics 20 - Prof. Anderson 14 A Simple Assumption The average value of u, the
11、error term, in the population is 0. That is, E(u) = 0 This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0 Economics 20 - Prof. Anderson 15 Zero Conditional Mean We need to make a crucial assumption about how u and x are related We want it to be the case that knowin
12、g something about x does not give us any information about u, so that they are completely unrelated. That is, that E(u|x) = E(u) = 0, which implies E(y|x) = 0 + 1x Economics 20 - Prof. Anderson 16 . . x1 x2 E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y
13、|x) E(y|x) = 0 + 1x y f(y) Economics 20 - Prof. Anderson 17 Ordinary Least Squares Basic idea of regression is to estimate the population parameters from a sample Let (xi,yi): i=1, ,n denote a random sample of size n from the population For each observation in this sample, it will be the case that y
14、i = 0 + 1xi + ui Economics 20 - Prof. Anderson 18 . . . . y4 y1 y2 y3 x1 x2 x3 x4 u1 u2 u3 u4 x y Population regression line, sample data points and the associated error terms E(y|x) = 0 + 1x Economics 20 - Prof. Anderson 19 Deriving OLS Estimates To derive the OLS estimates we need to realize that
15、our main assumption of E(u|x) = E(u) = 0 also implies that Cov(x,u) = E(xu) = 0 Why? Remember from basic probability that Cov(X,Y) = E(XY) E(X)E(Y) Economics 20 - Prof. Anderson 20 Deriving OLS continued We can write our 2 restrictions just in terms of x, y, 0 and 1 , since u = y 0 1x E(y 0 1x) = 0
16、Ex(y 0 1x) = 0 These are called moment restrictions Economics 20 - Prof. Anderson 21 Deriving OLS using M.O.M. The method of moments approach to estimation implies imposing the population moment restrictions on the sample moments What does this mean? Recall that for E(X), the mean of a population di
17、stribution, a sample estimator of E(X) is simply the arithmetic mean of the sample Economics 20 - Prof. Anderson 22 More Derivation of OLS We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true The sample versions are as follows: 0 0
18、1 10 1 1 10 1 n i iii n i ii xyxn xyn Economics 20 - Prof. Anderson 23 More Derivation of OLS Given the definition of a sample mean, and properties of summation, we can rewrite the first condition as follows xy xy 10 10 or , Economics 20 - Prof. Anderson 24 More Derivation of OLS n i ii n i i n i ii
19、 n i ii n i iii xxyyxx xxxyyx xxyyx 1 2 1 1 1 1 1 1 11 0 Economics 20 - Prof. Anderson 25 So the OLS estimated slope is 0 that provided 1 2 1 2 1 1 n i i n i i n i ii xx xx yyxx Economics 20 - Prof. Anderson 26 Summary of OLS slope estimate The slope estimate is the sample covariance between x and y
20、 divided by the sample variance of x If x and y are positively correlated, the slope will be positive If x and y are negatively correlated, the slope will be negative Only need x to vary in our sample Economics 20 - Prof. Anderson 27 More OLS Intuitively, OLS is fitting a line through the sample poi
21、nts such that the sum of squared residuals is as small as possible, hence the term least squares The residual, , is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point Economics 20 - Prof. Anderson 28 . . . . y4 y1 y2 y3 x
22、1 x2 x3 x4 1 2 3 4 x y Sample regression line, sample data points and the associated estimated error terms xy 10 Economics 20 - Prof. Anderson 29 Alternate approach to derivation Given the intuitive idea of fitting a line, we can set up a formal minimization problem That is, we want to choose our pa
23、rameters such that we minimize the following: n i ii n i i xyu 1 2 10 1 2 Economics 20 - Prof. Anderson 30 Alternate approach, continued If one uses calculus to solve the minimization problem for the two parameters you obtain the following first order conditions, which are the same as we obtained be
24、fore, multiplied by n 0 0 1 10 1 10 n i iii n i ii xyx xy Economics 20 - Prof. Anderson 31 Algebraic Properties of OLS The sum of the OLS residuals is zero Thus, the sample average of the OLS residuals is zero as well The sample covariance between the regressors and the OLS residuals is zero The OLS
25、 regression line always goes through the mean of the sample Economics 20 - Prof. Anderson 32 Algebraic Properties (precise) xy ux n u u n i ii n i i n i i 10 1 1 1 0 0 thus,and 0 Economics 20 - Prof. Anderson 33 More terminology SSR SSE SSTThen (SSR) squares of sum residual theis (SSE) squares of su
26、m explained theis (SST) squares of sum total theis :following thedefine then We part, dunexplainean and part, explainedan of up made being asn observatioeach ofcan think We 2 2 2 i i i iii u yy yy uyy Economics 20 - Prof. Anderson 34 Proof that SST = SSE + SSR 0 that know weand SSE 2 SSR 2 2 2 2 2 2
27、 yyu yyu yyyyuu yyu yyyyyy ii ii iiii ii iiii Economics 20 - Prof. Anderson 35 Goodness-of-Fit How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regress
28、ion R2 = SSE/SST = 1 SSR/SST Economics 20 - Prof. Anderson 36 Using Stata for OLS regressions Now that weve derived the formula for calculating the OLS estimates of our parameters, youll be happy to know you dont have to compute them by hand Regressions in Stata are very simple, to run the regressio
29、n of y on x, just type reg y x Economics 20 - Prof. Anderson 37 Unbiasedness of OLS Assume the population model is linear in parameters as y = 0 + 1x + u Assume we can use a random sample of size n, (xi, yi): i=1, 2, , n, from the population model. Thus we can write the sample model yi = 0 + 1xi + u
30、i Assume E(u|x) = 0 and thus E(ui|xi) = 0 Assume there is variation in the xi Economics 20 - Prof. Anderson 38 Unbiasedness of OLS (cont) In order to think about unbiasedness, we need to rewrite our estimator in terms of the population parameter Start with a simple rewrite of the formula as 2 2 2 1
31、where, xxs s yxx ix x ii Economics 20 - Prof. Anderson 39 Unbiasedness of OLS (cont) ii iii ii iii iiiii uxx xxxxx uxx xxxxx uxxxyxx 10 10 10 Economics 20 - Prof. Anderson 40 Unbiasedness of OLS (cont) 2 11 2 1 2 thusand , asrewritten becan numerator the,so , 0 x ii iix iii i s uxx uxxs xxxxx xx Eco
32、nomics 20 - Prof. Anderson 41 Unbiasedness of OLS (cont) 1211 21 1 then, 1 thatso ,let ii x ii x i ii uEd s E ud s xxd Economics 20 - Prof. Anderson 42 Unbiasedness Summary The OLS estimates of 1 and 0 are unbiased Proof of unbiasedness depends on our 4 assumptions if any assumption fails, then OLS
33、is not necessarily unbiased Remember unbiasedness is a description of the estimator in a given sample we may be “near” or “far” from the true parameter Economics 20 - Prof. Anderson 43 Variance of the OLS Estimators Now we know that the sampling distribution of our estimate is centered around the tr
34、ue parameter Want to think about how spread out this distribution is Much easier to think about this variance under an additional assumption, so Assume Var(u|x) = s2 (Homoskedasticity) Economics 20 - Prof. Anderson 44 Variance of OLS (cont) Var(u|x) = E(u2|x)-E(u|x)2 E(u|x) = 0, so s2 = E(u2|x) = E(
35、u2) = Var(u) Thus s2 is also the unconditional variance, called the error variance s, the square root of the error variance is called the standard deviation of the error Can say: E(y|x)=0 + 1x and Var(y|x) = s2 Economics 20 - Prof. Anderson 45 . . x1 x2 Homoskedastic Case E(y|x) = 0 + 1x y f(y|x) Ec
36、onomics 20 - Prof. Anderson 46 . x x1 x2 f(y|x) Heteroskedastic Case x3 . . E(y|x) = 0 + 1x Economics 20 - Prof. Anderson 47 Variance of OLS (cont) 12 2 2 2 2 2 2 2 2 222 2 2 2 2 2 2 2 211 1 11 11 1 s s ss Var s s s d s d s uVard s udVar s ud s VarVar x x x i x i x ii x ii x ii x Economics 20 - Prof
37、. Anderson 48 Variance of OLS Summary The larger the error variance, s2, the larger the variance of the slope estimate The larger the variability in the xi, the smaller the variance of the slope estimate As a result, a larger sample size should decrease the variance of the slope estimate Problem tha
38、t the error variance is unknown Economics 20 - Prof. Anderson 49 Estimating the Error Variance We dont know what the error variance, s2, is, because we dont observe the errors, ui What we observe are the residuals, i We can use the residuals to form an estimate of the error variance Economics 20 - P
39、rof. Anderson 50 Error Variance Estimate (cont) 2/ 2 1 is ofestimator unbiasedan Then, 22 2 1100 1010 10 nSSRu n u xux xyu i i iii iii s s Economics 20 - Prof. Anderson 51 Error Variance Estimate (cont) 2 1 2 1 1 2 / se , oferror standard the have then wefor substitute weif sd that recall regression
40、 theoferror Standard xx s i x s ss s ss Economics 20 - Prof. Anderson 52 Multiple Regression Analysis y = 0 + 1x1 + 2x2 + . . . kxk + u 1. Estimation Economics 20 - Prof. Anderson 53 Parallels with Simple Regression 0 is still the intercept 1 to k all called slope parameters u is still the error ter
41、m (or disturbance) Still need to make a zero conditional mean assumption, so now assume that E(u|x1,x2, ,xk) = 0 Still minimizing the sum of squared residuals, so have k+1 first order conditions Economics 20 - Prof. Anderson 54 Interpreting Multiple Regression tioninterpreta a has each is that , tha
42、timplies fixed ,., holding so , . so , . 11 2 2211 22110 ribus ceteris pa xy xx xxxy xxxy k kk kk Economics 20 - Prof. Anderson 55 A “Partialling Out” Interpretation 2201 1 2 111 22110 regression estimated thefrom residuals the are where, then , i.e. , 2 wherecase heConsider t xx rryr xxy k iiii Eco
43、nomics 20 - Prof. Anderson 56 “Partialling Out” continued Previous equation implies that regressing y on x1 and x2 gives same effect of x1 as regressing y on residuals from a regression of x1 on x2 This means only the part of xi1 that is uncorrelated with xi2 are being related to yi so were estimati
44、ng the effect of x1 on y after x2 has been “partialled out” Economics 20 - Prof. Anderson 57 Simple vs Multiple Reg Estimate sample in the eduncorrelat are and OR ) ofeffect partial no (i.e. 0 :unless Generally, regression multiple with the regression simple theCompare 21 22 11 22110 110 xx x xxy xy
45、 Economics 20 - Prof. Anderson 58 Goodness-of-Fit SSR SSE SSTThen (SSR) squares of sum residual theis (SSE) squares of sum explained theis (SST) squares of sum total theis :following thedefine then We part, dunexplainean and part, explainedan of up made being asn observatioeach ofcan think We 2 2 2
46、i i i iii u yy yy uyy Economics 20 - Prof. Anderson 59 Goodness-of-Fit (continued) How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = SSE
47、/SST = 1 SSR/SST Economics 20 - Prof. Anderson 60 Goodness-of-Fit (continued) 2 2 2 2 2 values theand actual the betweent coefficienn correlatio squared the toequal being as of think alsocan We yyyy yyyy R yy R ii ii ii Economics 20 - Prof. Anderson 61 More about R-squared R2 can never decrease when another independent variable is added to a regression, and usually will increase Because R2 will usually inc