1、Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-1Chapter 12Simple Linear RegressionBusiness Statistics:A First CourseFifth EditionBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-2Learning ObjectivesIn this chapter,you learn:n How to use regression analysis t
2、o predict the value of a dependent variable based on an independent variablen The meaning of the regression coefficients b0 and b1n How to evaluate the assumptions of regression analysis and know what to do if the assumptions are violatedn To make inferences about the slope and correlation coefficie
3、ntn To estimate mean values and predict individual valuesBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-3Correlation vs.RegressionnA scatter plot can be used to show the relationship between two variablesnCorrelation analysis is used to measure the strength of the association(l
4、inear relationship)between two variablesnCorrelation is only concerned with strength of the relationship nNo causal effect is implied with correlationnScatter plots were first presented in Ch.2nCorrelation was first presented in Ch.3Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 1
5、2-4Introduction to Regression AnalysisnRegression analysis is used to:nPredict the value of a dependent variable based on the value of at least one independent variablenExplain the impact of changes in an independent variable on the dependent variableDependent variable:the variable we wish to predic
6、t or explainIndependent variable:the variable used to predict or explain the dependent variableBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-5Simple Linear Regression ModelnOnly one independent variable,XnRelationship between X and Y is described by a linear functionnChanges i
7、n Y are assumed to be related to changes in XBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-6Types of RelationshipsYXYXYYXXLinear relationshipsCurvilinear relationshipsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-7Types of RelationshipsYXYXYYXXStrong rel
8、ationshipsWeak relationships(continued)Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-8Types of RelationshipsYXYXNo relationship(continued)Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-9ii10iXYLinear componentSimple Linear Regression ModelPopulation Y int
9、ercept Population SlopeCoefficient Random Error termDependent VariableIndependent VariableRandom Error componentBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-10(continued)Random Error for this Xi valueYXObserved Value of Y for XiPredicted Value of Y for Xi ii10iXYXiSlope=1Inte
10、rcept=0 iSimple Linear Regression ModelBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-11i10iXbbYThe simple linear regression equation provides an estimate of the population regression lineSimple Linear Regression Equation(Prediction Line)Estimate of the regression interceptEsti
11、mate of the regression slopeEstimated (or predicted)Y value for observation iValue of X for observation iBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-12The Least Squares Methodb0 and b1 are obtained by finding the values of that minimize the sum of the squared differences bet
12、ween Y and :2i10i2ii)Xb(b(Ymin)Y(YminYBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-13Finding the Least Squares EquationnThe coefficients b0 and b1,and other regression results in this chapter,will be found using Excel or MinitabFormulas are shown in the text for those who are
13、 interestedBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-14nb0 is the estimated mean value of Y when the value of X is zeronb1 is the estimated change in the mean value of Y as a result of a one-unit change in XInterpretation of the Slope and the InterceptBusiness Statistics:A
14、 First Course,5e 2009 Prentice-Hall,Inc.Chap 12-15Simple Linear Regression ExamplenA real estate agent wishes to examine the relationship between the selling price of a home and its size(measured in square feet)nA random sample of 10 houses is selectednDependent variable(Y)=house price in$1000snInde
15、pendent variable(X)=square feetBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-16Simple Linear Regression Example:DataHouse Price in$1000s(Y)Square Feet(X)2451400312160027917003081875199110021915504052350324245031914252551700Business Statistics:A First Course,5e 2009 Prentice-Ha
16、ll,Inc.Chap 12-17Simple Linear Regression Example:Scatter PlotHouse price model:Scatter PlotBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-18Simple Linear Regression Example:Using ExcelBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-19Simple Linear Regressi
17、on Example:Excel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-valueLower 95
18、%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.18580The regression equation is:feet)(square 0.10977 98.24833 price houseBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-20Simple Linear Regression Example:Minitab
19、OutputThe regression equation isPrice=98.2+0.110 Square Feet Predictor Coef SE Coef T PConstant 98.25 58.03 1.69 0.129Square Feet 0.10977 0.03297 3.33 0.010 S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%Analysis of Variance Source DF SS MS F PRegression 1 18935 18935 11.08 0.010Residual Error 8 13666 1708Total
20、 9 32600The regression equation is:house price=98.24833+0.10977(square feet)Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-21Simple Linear Regression Example:Graphical RepresentationHouse price model:Scatter Plot and Prediction Linefeet)(square 0.10977 98.24833 price houseSlope
21、=0.10977Intercept=98.248 Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-22Simple Linear Regression Example:Interpretation of bonb0 is the estimated mean value of Y when the value of X is zero(if X=0 is in the range of observed X values)nBecause a house cannot have a square foot
22、age of 0,b0 has no practical applicationfeet)(square 0.10977 98.24833 price houseBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-23Simple Linear Regression Example:Interpreting b1nb1 estimates the change in the mean value of Y as a result of a one-unit increase in XnHere,b1=0.10
23、977 tells us that the mean value of a house increases by 0.10977($1000)=$109.77,on average,for each additional one square foot of sizefeet)(square 0.10977 98.24833 price houseBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-24317.850)0.1098(200 98.25(sq.ft.)0.1098 98.25 price hou
24、sePredict the price for a house with 2000 square feet:The predicted price for a house with 2000 square feet is 317.85($1,000s)=$317,850Simple Linear Regression Example:Making PredictionsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-25Simple Linear Regression Example:Making Pre
25、dictionsnWhen using a regression model for prediction,only predict within the relevant range of dataRelevant range for interpolationDo not try to extrapolate beyond the range of observed XsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-26Measures of VariationnTotal variation is
26、 made up of two parts:SSE SSR SSTTotal Sum of SquaresRegression Sum of SquaresError Sum of Squares2i)YY(SST2ii)YY(SSE2i)YY(SSRwhere:=Mean value of the dependent variableYi=Observed value of the dependent variable =Predicted value of Y for the given Xi valueiYYBusiness Statistics:A First Course,5e 20
27、09 Prentice-Hall,Inc.Chap 12-27nSST=total sum of squares (Total Variation)nMeasures the variation of the Yi values around their mean YnSSR=regression sum of squares (Explained Variation)nVariation attributable to the relationship between X and YnSSE=error sum of squares (Unexplained Variation)nVaria
28、tion in Y attributable to factors other than X(continued)Measures of VariationBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-28(continued)XiYXYiSST=(Yi-Y)2SSE=(Yi-Yi)2 SSR=(Yi-Y)2 _Y YY_Y Measures of VariationBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-
29、29nThe coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variablenThe coefficient of determination is also called r-squared and is denoted as r2Coefficient of Determination,r21r02note:squares of sum squares
30、of regression2totalsumSSTSSRrBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-30r2=1Examples of r2 ValuesYXYXr2=1r2=1Perfect linear relationship between X and Y:100%of the variation in Y is explained by variation in XBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Ch
31、ap 12-31Examples of r2 ValuesYXYX0 r2 1Weaker linear relationships between X and Y:Some but not all of the variation in Y is explained by variation in XBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-32Examples of r2 Valuesr2=0No linear relationship between X and Y:The value of
32、Y does not depend on X.(None of the variation in Y is explained by variation in X)YXr2=0Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-33Simple Linear Regression Example:Coefficient of Determination,r2 in ExcelRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Squa
33、re0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.0329
34、73.329380.010390.033740.1858058.08%of the variation in house prices is explained by variation in square feet0.5808232600.500018934.9348SSTSSRr2Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-34Simple Linear Regression Example:Coefficient of Determination,r2 in MinitabThe regress
35、ion equation isPrice=98.2+0.110 Square Feet Predictor Coef SE Coef T PConstant 98.25 58.03 1.69 0.129Square Feet 0.10977 0.03297 3.33 0.010 S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%Analysis of Variance Source DF SS MS F PRegression 1 18935 18935 11.08 0.010Residual Error 8 13666 1708Total 9 326000.5808232
36、600.500018934.9348SSTSSRr258.08%of the variation in house prices is explained by variation in square feetBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-35Standard Error of EstimatenThe standard deviation of the variation of observations around the regression line is estimated b
37、y2)(212nYYnSSESniiiYXWhereSSE =error sum of squares n=sample sizeBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-36Simple Linear Regression Example:Standard Error of Estimate in ExcelRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33
38、032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.1858
39、041.33032SYXBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-37Simple Linear Regression Example:Standard Error of Estimate in MinitabThe regression equation isPrice=98.2+0.110 Square Feet Predictor Coef SE Coef T PConstant 98.25 58.03 1.69 0.129Square Feet 0.10977 0.03297 3.33 0.
40、010 S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%Analysis of Variance Source DF SS MS F PRegression 1 18935 18935 11.08 0.010Residual Error 8 13666 1708Total 9 3260041.33032SYXBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-38Comparing Standard ErrorsYYXXYXS smallYXS largeSYX is a measure
41、 of the variation of observed Y values from the regression lineThe magnitude of SYX should always be judged relative to the size of the Y values in the sample datai.e.,SYX=$41.33K is moderately small relative to house prices in the$200K-$400K rangeBusiness Statistics:A First Course,5e 2009 Prentice-
42、Hall,Inc.Chap 12-39Assumptions of RegressionL.I.N.EnLinearitynThe relationship between X and Y is linearnIndependence of ErrorsnError values are statistically independentnNormality of ErrornError values are normally distributed for any given value of XnEqual Variance(also called homoscedasticity)nTh
43、e probability distribution of the errors has constant varianceBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-40Residual AnalysisnThe residual for observation i,ei,is the difference between its observed and predicted valuenCheck the assumptions of regression by examining the res
44、idualsnExamine for linearity assumptionnEvaluate independence assumption nEvaluate normal distribution assumption nExamine for constant variance for all levels of X(homoscedasticity)nGraphical Analysis of ResidualsnCan plot residuals vs.XiiiYYeBusiness Statistics:A First Course,5e 2009 Prentice-Hall
45、,Inc.Chap 12-41Residual Analysis for LinearityNot LinearLinearxresidualsxYxYxresidualsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-42Residual Analysis for IndependenceNot IndependentIndependentXXresidualsresidualsXresidualsBusiness Statistics:A First Course,5e 2009 Prentice-H
46、all,Inc.Chap 12-43Checking for NormalitynExamine the Stem-and-Leaf Display of the ResidualsnExamine the Boxplot of the ResidualsnExamine the Histogram of the ResidualsnConstruct a Normal Probability Plot of the ResidualsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-44Residual
47、Analysis for NormalityPercentResidualWhen using a normal probability plot,normal errors will approximately display in a straight line-3 -2 -1 0 1 2 30100Business Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-45Residual Analysis for Equal Variance Non-constant varianceConstant variancex
48、xYxxYresidualsresidualsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-46Simple Linear Regression Example:Excel Residual OutputRESIDUAL OUTPUTPredicted House Price Residuals1251.92316-6.9231622273.8767138.123293284.85348-5.8534844304.062843.9371625218.99284-19.992846268.38832-49
49、.388327356.2025148.797498367.17929-43.179299254.667464.3326410284.85348-29.85348Does not appear to violate any regression assumptionsBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-47Inferences About the SlopenThe standard error of the regression slope coefficient(b1)is estimate
50、d by2iYXYXb)X(XSSSXSS1where:=Estimate of the standard error of the slope =Standard error of the estimate1bS2nSSESYXBusiness Statistics:A First Course,5e 2009 Prentice-Hall,Inc.Chap 12-48Inferences About the Slope:t Testnt test for a population slopenIs there a linear relationship between X and Y?nNu