1、Quantitative Methods For Decision Makers3rd Edition Chapter 10 Forecasting:regressionLearning ObjectivesBy the end of this chapter you should be able to:Understand the principles of simple linear regression Be able to interpret the key statistics from a regression equation Be able to explain the lim
2、itations of regression in business forecasting Be aware of the extensions to the basic regression modelCorrelation vs.RegressionA scatter diagram can be used to show the relationship between two variablesCorrelation analysis is used to measure strength of the association(linear relationship)between
3、two variables Correlation is only concerned with strength of the relationship No causal effect is implied with correlation22)()()()(yyxxyyxxrnyynxxnyxxyr/)(/)(/)(2222Types of RelationshipsYXYXYYXXLinear relationshipsCurvilinear relationshipsTypes of RelationshipsYXYXYYXXStrong relationshipsWeak rela
4、tionships(continued)Types of RelationshipsYXYXNo relationship(continued)Introduction to Regression Analysis Regression analysis is used to:Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes in an independent variable on the
5、 dependent variableDependent variable:the variable we wish to predict or explainIndependent variable:the variable used to explain the dependent variableSimple Linear Regression Model Only one independent variable,X Relationship between X and Y is described by a linear function Changes in Y are assum
6、ed to be caused by changes in Xii10iXYLinear componentSimple Linear Regression ModelPopulation Y intercept Population SlopeCoefficient Random Error termDependent VariableIndependent VariableRandom Error component(continued)Random Error for this Xi valueYXObserved Value of Y for XiPredicted Value of
7、Y for Xi ii10iXYXiSlope=1Intercept=0 iSimple Linear Regression Modeli10iXbbYThe simple linear regression equation provides an estimate of the population regression lineSimple Linear Regression Equation(Prediction Line)Estimate of the regression interceptEstimate of the regression slopeEstimated (or
8、predicted)Y value for observation iValue of X for observation iLeast Squares Method b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared differences between Yi and :2i10i2ii)Xb(b(Ymin)Y(YminYnxbnybnxxnyxxyb 10221/)(/)(b0 is the estimated average value of Y w
9、hen the value of X is zero b1 is the estimated change in the average value of Y as a result of a one-unit change in XInterpretation of the Slope and the InterceptUsing the regression equation Forecasting 1.is not a guaranteed outcome 2.does not guarantee the relationship will continue unchanged in t
10、he future 3.interpolation not extrapolation Performance evaluationSimple Linear Regression Example A real estate agent wishes to examine the relationship between the selling price of a home and its size(measured in square feet)A random sample of 10 houses is selected Dependent variable(Y)=house pric
11、e in$1000s Independent variable(X)=square feetSample Data for House Price ModelHouse Price in$1000s(Y)Square Feet(X)2451400312160027917003081875199110021915504052350324245031914252551700Graphical Presentation House price model:scatter plotExcel OutputRegression StatisticsMultiple R0.76211R Square0.5
12、8082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Squar
13、e Feet0.109770.032973.329380.010390.033740.18580The regression equation is:feet)(square 0.10977 98.24833 price houseGraphical Presentation House price model:scatter plot and regression linefeet)(square 0.10977 98.24833 price houseSlope=0.10977Intercept=98.248 Interpretation of the Intercept,b0 b0 is
14、 the estimated average value of Y when the value of X is zero Here,no houses had 0 square feet,so b0=98.24833 just indicates that,for houses within the range of sizes observed,$98,248.33 is the portion of the house price not explained by square feetfeet)(square 0.10977 98.24833 price houseInterpreta
15、tion of the Slope Coefficient,b1 b1 measures the estimated change in the average value of Y as a result of a one-unit change in X Here,b1=.10977 tells us that the average value of a house increases by.10977($1000)=$109.77,on average,for each additional one square foot of sizefeet)(square 0.10977 98.
16、24833 price house317.850)0.1098(200 98.25(sq.ft.)0.1098 98.25 price housePredict the price for a house with 2000 square feet:The predicted price for a house with 2000 square feet is 317.85($1,000s)=$317,850Predictions using Regression AnalysisInterpolation vs.Extrapolation When using a regression mo
17、del for prediction,only predict within the relevant range of dataRelevant range for interpolationDo not try to extrapolate beyond the range of observed XsFurther statistical evaluation of the regression equation Total variation is made up of two parts:SSE SSR SSTTotal Sum of SquaresRegression Sum of
18、 SquaresError Sum of Squares2i)YY(SST2ii)YY(SSE2i)YY(SSRwhere:=Average value of the dependent variableYi=Observed values of the dependent variable i =Predicted value of Y for the given Xi valueYY SST=total sum of squares Measures the variation of the Yi values around their mean Y SSR=regression sum
19、of squares Explained variation attributable to the relationship between X and Y SSE=error sum of squares Variation attributable to factors other than the relationship between X and Y(continued)Measures of Variation(continued)XiYXYiSST=(Yi-Y)2SSE=(Yi-Yi)2 SSR=(Yi-Y)2 _Y YY_Y Measures of Variation The
20、 coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called r-squared and is denoted as r2Coefficient of Determination,r21r02note:squares of sum total squares
21、of sum regressionSSTSSRr2r2=1Examples of Approximate r2 ValuesYXYXr2=1r2=1Perfect linear relationship between X and Y:100%of the variation in Y is explained by variation in XExamples of Approximate r2 ValuesYXYX0 r2 1Weaker linear relationships between X and Y:Some but not all of the variation in Y
22、is explained by variation in XExamples of Approximate r2 Valuesr2=0No linear relationship between X and Y:The value of Y does not depend on X.(None of the variation in Y is explained by variation in X)YXr2=0Excel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Stan
23、dard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010
24、390.033740.1858058.08%of the variation in house prices is explained by variation in square feet0.5808232600.500018934.9348SSTSSRr2Standard Error of Estimate The standard deviation of the variation of observations around the regression line is estimated by2n)YY(2nSSESn1i2iiYXWhereSSE =error sum of sq
25、uares n=sample sizeExcel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 CoefficientsStandard Errort StatP-value
26、Lower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.1858041.33032SYXComparing Standard ErrorsYYXXYXs smallYXs largeSYX is a measure of the variation of observed Y values from the regression lineInferences About the Slope The stan
27、dard error of the regression slope coefficient(b1)is estimated by2iYXYXb)X(XSSSXSS1where:=Estimate of the standard error of the least squares slope =Standard error of the estimate1bS2nSSESYXComparing Standard Errors of the SlopeYXYX1bS small1bS large is a measure of the variation in the slope of reg
28、ression lines from different possible samples1bSExcel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000 Coefficien
29、tsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.185800.03297S1bInference about the Slope:t Test t test for a population slope Is there a linear relationship between X and Y?Null and alternative hy
30、potheses H0:1=0(no linear relationship)H1:1 0(linear relationship does exist)Test statistic 1b11Sbt2nd.f.where:b1=regression slope coefficient 1=hypothesized slope Sb=standard error of the slope1House Price in$1000s(y)Square Feet(x)24514003121600279170030818751991100219155040523503242450319142525517
31、00(sq.ft.)0.1098 98.25 price houseSimple Linear Regression Equation:The slope of this model is 0.1098 Does square footage of the house affect its sales price?Inference about the Slope:t Test(continued)Inferences about the Slope:t Test ExampleH0:1=0H1:1 0From Excel output:CoefficientsStandard Errort
32、StatP-valueIntercept98.2483358.033481.692960.12892Square Feet0.109770.032973.329380.010391bStb132938.303297.0010977.0Sbt1b11Inferences about the Slope:t Test ExampleH0:1=0H1:1 0Test Statistic:t=3.329There is sufficient evidence that square footage affects house priceFrom Excel output:Reject H0 Coeff
33、icientsStandard Errort StatP-valueIntercept98.2483358.033481.692960.12892Square Feet0.109770.032973.329380.010391bStb1Decision:Conclusion:Reject H0Reject H0a/2=.025-t/2Do not reject H00t/2a/2=.025-2.30602.30603.329d.f.=10-2=8(continued)Inferences about the Slope:t Test ExampleH0:1=0H1:1 0 P-value=0.
34、01039There is sufficient evidence that square footage affects house priceFrom Excel output:Reject H0 CoefficientsStandard Errort StatP-valueIntercept98.2483358.033481.692960.12892Square Feet0.109770.032973.329380.01039P-valueDecision:P-value 3.329)+P(t -3.329)=0.01039(for 8 d.f.)Prediction Interval
35、for an Individual Y,Given X2i2i2ii)X(X)X(Xn1SSX)X(Xn1hConfidence interval estimate for an Individual value of Y given a particular XiihtY1S :Yfor interval ConfidenceYX2n/2,XXia2n)YY(2nSSESn1i2iiYXEstimation of Individual Values:ExampleFind the 95%prediction interval for an individual house with 2,00
36、0 square feetPredicted Price Yi=317.85($1,000s)Prediction Interval Estimate for YX=X102.28317.85)X(X)X(Xn11StY2i2iYX2-n/2,aThe prediction interval endpoints are 215.50 and 420.07,or from$215,500 to$420,070iEstimating a trend using regression Trend=f(time)Trend=a+b T NON-LINEAR REGRESSIONThe Multiple
37、 Regression ModelIdea:Examine the linear relationship between 1 dependent(Y)&2 or more independent variables(Xi)ikik2i21i10iXXXYMultiple Regression Model with k Independent Variables:Y-interceptPopulation slopesRandom ErrorMultiple Regression EquationThe coefficients of the multiple regression model
38、 are estimated using sample datakik2i21i10iXbXbXbbYEstimated(or predicted)value of YEstimated slope coefficientsMultiple regression equation with k independent variables:EstimatedinterceptIn this chapter we will always use Excel to obtain the regression slope coefficients and other regression summar
39、y measures.Example:2 Independent Variables A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable:Pie sales(units per week)Independent variables:Price(in$)Advertising($100s)Data are collected for 15 weeksPie Sales ExampleSales=b0+b1(Price)+b2(Adv
40、ertising)WeekPie SalesPrice($)Advertising($100s)13505.503.324607.503.333508.003.044308.004.553506.803.063807.504.074304.503.084706.403.794507.003.5104905.004.0113407.203.5123007.903.2134405.904.0144505.003.5153007.002.7Multiple regression equation:Multiple Regression OutputRegression StatisticsMulti
41、ple R0.72213R Square0.52148Adjusted R Square0.44172Standard Error47.46341Observations15ANOVA dfSSMSFSignificance FRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept306.52619114.253892.682850.019935
42、7.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.854780.0144917.55303130.70888ertising)74.131(Adv ce)24.975(Pri-306.526 SalesThe Multiple Regression Equationertising)74.131(Adv ce)24.975(Pri-306.526 Salesb1=-24.975:sales will decrease,on average,by 2
43、4.975 pies per week for each$1 increase in selling priceb2=74.131:sales will increase,on average,by 74.131 pies per week for each$100 increase in advertisingwhere Sales is in number of pies per week Price is in$Advertising is in$100s.Using The Equation to Make PredictionsPredict sales for a week in
44、which the selling price is$5.50 and advertising is$350:Predicted sales is 428.62 pies428.62(3.5)74.131 (5.50)24.975-306.526 ertising)74.131(Adv ce)24.975(Pri-306.526 SalesNote that Advertising is in$100s,so$350 means that X2=3.5Coefficient of Multiple Determination Reports the proportion of total va
45、riation in Y explained by all X variables taken togethersquares of sum totalsquares of sum regressionSSTSSRr2Adjusted r2 r2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable?We lose a degree of
46、 freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n=sample size,k=number of independent variabl
47、es)Penalize excessive use of unimportant independent variables Smaller than r2 Useful in comparing among modelsAdjusted r2(continued)11)1(122knnrradjIs the Model Significant?F Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered
48、 together and Y Use F-test statistic Hypotheses:H0:1=2=k=0 (no linear relationship)H1:at least one i 0 (at least one independent variable affects Y)F Test for Overall Significance Test statistic:where F has(numerator)=k and(denominator)=(n k-1)degrees of freedom 1knSSEkSSRMSEMSRF6.53862252.814730.0M
49、SEMSRFRegression StatisticsMultiple R0.72213R Square0.52148Adjusted R Square0.44172Standard Error47.46341Observations15ANOVA dfSSMSFSignificance FRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept3
50、06.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.854780.0144917.55303130.70888(continued)F Test for Overall SignificanceWith 2 and 12 degrees of freedomP-value for the F TestH0:1=2=0H1:1 and 2 not both zeroa=.05df1=2 d