1、Intermediate Econometrics Yan Shen1The Simple Regression Model (1)简单二元回归y = b0 + b1x + uIntermediate Econometrics Yan Shen2Chapter Outline 本章大纲nDefinition of the Simple Regression Model 简单回归模型的定义简单回归模型的定义nDeriving the Ordinary Least Squares Estimates 普通最小二乘法的推导普通最小二乘法的推导nMechanics of OLS OLS的操作技巧nUn
2、its of Measurement and Functional Form测量单位和函数形式nExpected Values and Variances of the OLS estimators OLS估计量的期望值和方差nRegression through the Origin 过原点回归Intermediate Econometrics Yan Shen3Lecture Outline 讲义大纲nSome Terminology 一些术语的注解nA Simple Assumption 一个简单假定nZero Conditional Mean Assumption 条件期望零值假定 n
3、What is Ordinary Least Squares 何为普通最小二乘法nDeriving OLS Estimates 普通最小二乘法的推导Intermediate Econometrics Yan Shen4Some Terminology 术语注解n In the simple linear regression model, where y = b0 + b1x + u, we typically refer to y as thenDependent Variable, ornLeft-Hand Side Variable, ornExplained Variable, orn
4、Regressand在简单二元回归模型y = b0 + b1x + u中, y通常被称为因变量,左边变量,被解释变量,或回归子。Intermediate Econometrics Yan Shen5Some Terminology术语注解n In the simple linear regression of y on x, we typically refer to x as thenIndependent Variable, ornRight-Hand Side Variable, ornExplanatory Variable, ornRegressor, ornCovariate, o
5、rnControl Variables在y 对 x进行回归的简单二元回归模型中, x通常被称为自变量,右边变量,解释变量,回归元,协变量,或控制变量。Intermediate Econometrics Yan Shen6Some Terminology术语注解nEquation y = b0 + b1x + u has only one nonconstant regressor x, it is called a simple linear regression model, or two-variables regression model, or bivariate linear reg
6、ression model. 等式y = b0 + b1x + u只有一个非常数回归元。我们称之为简单回归模型, 两变量回归模型或双变量回归模型.Intermediate Econometrics Yan Shen7Some Terminology术语注解nThe coefficients b0 , b1 are called the regression coefficients. nb0 is also called the constant term or the intercept term, or intercept parameter. nb1 represents the mar
7、ginal effects of the regressor, x. It is also called the slope parameter.b0 , b1被称为回归系数。 b0也被称为常数项或截矩项,或截矩参数。 b1代表了回归元x的边际效果,也被成为斜率参数。Intermediate Econometrics Yan Shen8Some Terminology术语注解n The variable u is called the error term or disturbance in the relationship. nIt represents factors other than
8、 x that can affect y. u 为误差项或扰动项,它代表了除了x之外可以影响y的因素。Intermediate Econometrics Yan Shen9Some Terminology术语注解nMeaning of linear: linear means linear in parameters, not necessarily mean that y and x must have a linear relationship.nThere are many cases that y and x have nonlinear relationship, but after
9、 some transformation, they are linear in parameters.nFor example, y=eb0+b1x+u .n线性的含义: y 和x 之间并不一定存在线性关系,但是,只要通过转换可以使y的转换形式和x的转换形式存在相对于参数的线性关系,该模型即称为线性模型。Intermediate Econometrics Yan Shen10Examples 简单二元回归模型例子nA simple wage equationwage= b0 + b1(years of education) + unb1 : if education increase by
10、one year, how much more wage will one gain.n上述简单工资函数描述了受教育年限和工资之间的关系, b1 衡量了多接受一年教育工资可以增加多少.Intermediate Econometrics Yan Shen11A Simple Assumption关于u的假定n The average value of u, the error term, in the population is 0. That is, E(u) = 0(2.5)n It it restrictive?n我们假定总体中误差项u的平均值为零. 该假定是否具有很大的限制性呢?Inte
11、rmediate Econometrics Yan Shen12A Simple Assumption关于u的假定nIf for example, E(u)=5. Then y = (b0 +5)+ b1x + (u-5),therefore, E(u)=E(u-5)=0.nThis is not a restrictive assumption, since we can always use b0 to normalize E(u) to 0.n上述推导说明我们总可以通过调整常数项来实现误差项的均值为零, 因此该假定的限制性不大.Intermediate Econometrics Yan
12、Shen13Zero Conditional Mean Assumption 条件期望零值假定 n We need to make a crucial assumption about how u and x are related n We want it to be the case that knowing something about x does not give us any information about u, so that they are completely unrelated. That isE(u|x) = E(u)。我们需要对u和 x之间的关系做一个关键假定。
13、理想状况是对x的了解并不增加对u的任何信息。换句话说,我们需要u和 x完全不相关。Intermediate Econometrics Yan Shen14Zero Conditional Mean Assumption 条件期望零值假定 nSince we have assumed E(u) = 0, therefore, E(u|x) = E(u) = 0. (2.6)nWhat does it mean?由于我们已经假定了E(u) = 0,因此有E(u|x) = E(u) = 0。该假定是何含义?Intermediate Econometrics Yan Shen15Zero Condit
14、ional Mean Assumption 条件期望零值假定 nIn the example of education, suppose u represents innate ability, zero conditional mean assumption meansE(ability|edu=6)=E(ability|edu=18)=0.nThe average level of ability is the same regardless of years of education.n在教育一例中,假定u 代表内在能力,条件期望零值假定说明不管解释教育的年限如何,该能力的平均值相同。
15、Intermediate Econometrics Yan Shen16Zero Conditional Mean Assumption 条件期望零值假定 nQuestion: Suppose that a score on a final exam, score, depends on classes attended (attend) and unobserved factors that affect exam performance (such as student ability). Then consider model score =b0 + b1attend +unWhen w
16、ould you expect it satisfy (2.6)?n假设期末成绩分数取决于出勤次数和影响学生现场发挥的因素,如学生个人素质。那么上述模型中假设(2.6)何时能够成立?Intermediate Econometrics Yan Shen17Zero Conditional Mean Assumption 条件期望零值假定 n(2.6) implies the population regression function, E(y|x) , satisfies E(y|x) = b0 + b1x.nE(y|x) as a linear function of x, where fo
17、r any x the distribution of y is centered about E(y|x).n(2.6)说明总体回归函数应满足E(y|x) = b0 + b1x。该函数是x的线性函数,y的分布以它为中心。Intermediate Econometrics Yan Shen18.x1=5x2 =10E(y|x) = b0 + b1xyf(y)给定x时y的条件分布Intermediate Econometrics Yan Shen19Deriving the Ordinary Least Squares Estimates 普通最小二乘法的推导n Basic idea of re
18、gression is to estimate the population parameters from a samplen Let (xi,yi): i=1, ,n denote a random sample of size n from the populationn For each observation in this sample, it will be the case that yi = b0 + b1xi + ui回归的基本思想是从样本去估计总体参数。 我们用(xi,yi): i=1, ,n 来表示一个随机样本,并假定每一观测值满足yi = b0 + b1xi + ui
19、。Intermediate Econometrics Yan Shen20.y4y1y2y3x1x2x3x4u1u2u3u4xyPopulation regression line, sample data pointsand the associated error terms总体回归线,样本观察点和相应误差E(y|x) = b b0 + b b1xIntermediate Econometrics Yan Shen21Deriving OLS Estimates普通最小二乘法的推导n To derive the OLS estimator we need to realize that o
20、ur main assumption of E(u|x) = E(u) = 0 also implies thatn Cov(x,u) = E(xu) = 0 nWhy? Remember from basic probability that Cov(X,Y) = E(XY) E(X)E(Y)由E(u|x) = E(u) = 0 可得Cov(x,u) = E(xu) = 0 。Intermediate Econometrics Yan Shen22Deriving OLS continued普通最小二乘法的推导n We can write our 2 restrictions just in
21、 terms of x, y, b0 and b1 , since u = y b0 b1xn E(y b0 b1x) = 0n Ex(y b0 b1x) = 0nThese are called moment restrictionsn可将u = y b0 b1x代入以得上述两个矩条件。Intermediate Econometrics Yan Shen23Deriving OLS using M.O.M.使用矩方法推导普通最小二乘法n The method of moments approach to estimation implies imposing the population m
22、oment restrictions on the sample moments。n矩方法是将总体的矩限制应用于样本中。Intermediate Econometrics Yan Shen24Derivation of OLS普通最小二乘法的推导普通最小二乘法的推导n We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true目标是通过选择参数值,使得在样本中矩条件也可以成立。n The sample versio
23、ns are as follows:0011011101niiiiniiixyxnxynbbbbIntermediate Econometrics Yan Shen25Derivation of OLS普通最小二乘法的推导普通最小二乘法的推导nGiven the definition of a sample mean, and properties of summation, we can rewrite the first condition as follows根据样本均值的定义以及加总的性质,可将第一个条件写为xyxy1010or,bbbbIntermediate Econometric
24、s Yan Shen26Derivation of OLS普通最小二乘法的推导普通最小二乘法的推导niiiniiniiiniiiniiiixxyyxxxxxyyxxxyyx12111111110bbbbIntermediate Econometrics Yan Shen27So the OLS estimated slope is因此OLS估计出的斜率为0 that provided121211niiniiniiixxxxyyxxbIntermediate Econometrics Yan Shen28Summary of OLS slope estimateOLS斜率估计法总结n The s
25、lope estimate is the sample covariance between x and y divided by the sample variance of x.n If x and y are positively correlated, the slope will be positive.n If x and y are negatively correlated, the slope will be negative.n Only need x to vary in our sample.n斜率估计量等于样本中x 和 y 的协方差除以x的方差。若x 和 y 正相关则
26、斜率为正,反之为负。Intermediate Econometrics Yan Shen29More OLS 关于OLS的更多信息n Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares。n The residual, , is an estimate of the error term, u, and is the difference be
27、tween the fitted line (sample regression function) and the sample point。nOLS法是要找到一条直线,使残差平方和最小。n残差是对误差项的估计,因此,它是拟合直线(样本回归函数)和样本点之间的距离。Intermediate Econometrics Yan Shen30.y4y1y2y3x1x2x3x41234xySample regression line, sample data pointsand the associated estimated error terms 样本回归线,样本数据点和相关的误差估计项xy10
28、bbIntermediate Econometrics Yan Shen31Alternate approach to derivation推导方法二n Given the intuitive idea of fitting a line, we can set up a formal minimization problemn That is, we want to choose our parameters such that we minimize the following:n正式解一个最小化问题,即通过选取参数而使下列值最小: niiiniixyu121012bbIntermedia
29、te Econometrics Yan Shen32Alternate approach, continued推导方法二n If one uses calculus to solve the minimization problem for the two parameters you obtain the following first order conditions, which are the same as we obtained before, multiplied by nn如果直接解上述方程我们得到下面两式,这两个式子等于前面两式乘以n00110110niiiiniiixyxx
30、ybbbbIntermediate Econometrics Yan Shen33Lecture Summary 讲义总结nIntroduce the simple linear regression model.nIntroduce the method of ordinary least squares to estimate the slope and intercept parameters using data from a random sample.n介绍简单线性回归模型n介绍通过随机样本的数据运用普通最小二乘法估计斜率和截距的参数值Intermediate Econometri
31、cs Yan Shen34The Simple Regression Model (2)简单二元回归y = b0 + b1x + uIntermediate Econometrics Yan Shen35Chapter Outline 本章大纲nDefinition of the Simple Regression Model 简单回归模型的定义nDeriving the Ordinary Least Squares Estimates 推导普通最小二乘法的估计量nMechanics of OLS OLS的操作技巧的操作技巧nUnites of Measurement and Function
32、al Form 测量单位和回归方程形式测量单位和回归方程形式nExpected Values and Variances of the OLS estimators OLS估计量的期望值和方差nRegression through the Origin 过原点的回归Intermediate Econometrics Yan Shen36Lecture Outline 讲义大纲nAlgebraic Properties of OLS OLS的代数特性nGoodness of fit 拟合优度nUsing Stata for OLS regression使用stata做OLS 回归nEffects
33、 of Changing Units in Measurement on OLS Statistics改变测量单位对OLS统计量的效果Intermediate Econometrics Yan Shen37 obsno salary roe salaryhat uhat 1 1095 14.1 1224 -129 2 1001 10.9 1165 -164 3 1122 23.5 1398 -276 4 578 5.9 1072 -494 5 1368 13.8 1219 149 6 1145 20 1333 -188 7 1078 16.4 1267 -189 8 1094 16.3 126
34、5 -171 9 1237 10.5 1157 80 10 833 26.3 1450 -617 11 567 25.9 1442 -875 12 933 26.8 1459 -526 13 1339 14.8 1237 102 14 937 22.3 1375 -439 15 2011 56.3 2005 6 Mechanics of OLS OLS的操作技巧Example: CEO Salary and Return on Equity 例:CEO的薪水和资本权益报酬率Intermediate Econometrics Yan Shen38Example: CEO Salary and R
35、eturn on Equity 例:CEO的薪水和资本权益报酬率nSalary: annual salary measured in $1000. In the 1990 data above, (min, mean, max)=(223, 1281, 14822).n变量salary衡量了已1000美元为单位的年薪,其最小值,均值和最大值分别如上。nRoe: net income/common equity, three-year average,(0.5, 17.18,56.3)nRoe净收入/所有者权益,为三年平均值。nN=209. The estimated relation(esti
36、mated salary)=963.191 + 18.501roe.Intermediate Econometrics Yan Shen39Example: CEO Salary and Return on Equity 例:CEO的薪水和资本权益报酬率nInterpretation:n对估计量的解释:n963.19: The salary that the CEO will get when roe=0.n常数项的估计值衡量了当roe为零时CEO的薪水。n18.5: If ROE increases by one percentage point, then salary is going
37、to increase by 18.5, i.e., $18,500.nb1 的估计值反应了ROE若增加一个百分点工资将增加18500美元。nIf roe=30, what is the estimated salary?Intermediate Econometrics Yan Shen40Algebraic Properties of OLS OLS的代数性质n The sum of the OLS residuals is zero OLS 残差和为零 (p24)n Thus, the sample average of the OLS residuals is zero as well
38、 因此 OLS 的样本残差平均值也为零.0 n1 thus,and0) (11011niiniiniiuxyubb Intermediate Econometrics Yan Shen41Algebraic Properties of OLS OLS的代数性质nThe sample covariance between the regressors and the OLS residuals is zeron回归元(解释变量)和OLS残差之间的样本协方差为零 (p25)Intermediate Econometrics Yan Shen42Algebraic Properties of OLS
39、 OLS的代数性质nThe OLS regression line always goes through the mean of the sample.nOLS回归线总是通过样本的均值。xy10bb Intermediate Econometrics Yan Shen43Algebraic Properties of OLS OLS的代数性质nWe can think of each observation as being made up of an explained part, and an unexplained part, 我们可把每一次观测看作由被解释部分和未解释部分构成.nTh
40、en the fitted values and residuals are uncorrelated in the sample. 预测值和残差在样本中是不相关的iiiuyy 0),cov(iiuyIntermediate Econometrics Yan Shen44Algebraic Properties of OLS OLS的代数性质 0)()()()()()()()(),cov(1010iiiiiiiiiiiiiiiiiuxEuEuxEuEyuyEuyEyEuEuyEyEuybbbbIntermediate Econometrics Yan Shen45More Terminolog
41、y更多术语nDefine the total sum of square as 定义总平方和为21()niiSSTyyIntermediate Econometrics Yan Shen46More Terminology更多术语nSST is a measure of the total sample variation in the ys; that is, it measures how spread out the ys are in the sample. n总平方和是对y在样本中所有变动的度量,即它度量了y在样本中的分散程度If we divide SST by n-1, we o
42、btain the sample variance of y.n将总平方和除以n-1,我们得到y的样本方差。Intermediate Econometrics Yan Shen47More Terminology更多术语nExplained Sum of Squares (SSE)is defined as 解释平方和定义为nIt measures the sample variation in the predicted value of ys. n它度量了y的预测值的在样本中的变动21()niiSSEyyIntermediate Econometrics Yan Shen48More Te
43、rminology更多术语nResidual Sum of Squares is defined as 残差平方和定义为nSSR measures the sample variation in the residuals.n残差平方和度量了残差的样本变异SSR=2iu Intermediate Econometrics Yan Shen49SST, SSR and SSEnThe total variation in y can always be expressed as the sum of the explained variation SSE and the unexplained
44、variation SSR, i.e.ny 的总变动可以表示为已解释的变动SSE和 未解释的变动SSR之和,即nSST=SSE+SSRIntermediate Econometrics Yan Shen50Proof that SST = SSE + SSR证明 SST = SSE + SSR SSE 2 SSR 222222yyuyyyyuuyyuyyyyyyiiiiiiiiiiiiIntermediate Econometrics Yan Shen51Proof that SST = SSE + SSRnTherefore, SST = SSE + SSR.nWe have used th
45、e fact that the fitted value and residuals are uncorrelated in the sample.n该证明中我们使用了一个事实, 即样本中因变量的拟合值和残差不相关. 0)(and showcan one0, 0 Usingn1n1n1yyuyyuxuiiiiiiiiIntermediate Econometrics Yan Shen52Goodness-of-Fit拟合优度n How do we think about how well our sample regression line fits our sample data?n我们如何
46、衡量样本回归线是否很好地拟合了样本数据呢?n Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regressionn可以计算模型解释的总平方和的比例,并把它定义为回归的R-平方n R2 = SSE/SST = 1 SSR/SSTIntermediate Econometrics Yan Shen53Goodness-of-Fit拟合优度nR-squared is the ratio of the explai
47、ned variation compared to the total variation. nR-平方是已解释的变动占所有变动的比例nIt is thus interpreted as the fraction of the sample variation in y that is explained by x. n它因此可被看作是y的样本变动中被可以被x解释的部分nThe value of R-squared is always between zero and one.nR-平方的值总是在0和1之间Intermediate Econometrics Yan Shen54Goodness
48、-of-Fit拟合优度nIn the social sciences, low R-squareds in regression equations are not uncommon, especially for cross-sectional analysis. n在社会科学中,特别是在截面数据分析中, 回归方程得到低的R-平方值并不罕见。nIt is worth emphasizing that a seemingly low R-squared does not necessarily mean that an OLS regression equation is useless.n值
49、得强调的是表面上低的R-平方值不一定说明OLS回归方程是没有价值的Intermediate Econometrics Yan Shen55Goodness-of-Fit拟合优度nExample 2.8nCEO Salary and Return on EquityCEO薪水和净资产回报nExample 2.9nVoting outcomes and Campaign Expenditures竞选结果和选举活动开支20.0132R 20.856R Intermediate Econometrics Yan Shen56Using Stata for OLS regressions使用 Stata
50、 进行OLS回归n Now that weve derived the formula for calculating the OLS estimates of our parameters, youll be happy to know you dont have to compute them by handn我们已经推导出公式计算参数的OLS估计值,所幸的是我们不必亲手去计算它们。n Regressions in Stata are very simple, to run the regression of y on x, just typen在Stata中进行回归非常简单,要让y对x进