1、Relationship among variables Functional relationship Statistical relationship(correlation)Y depends on X,but isnt merely determined by X.Example:price and sales daily high temperaturethe demand for air-conditioning RegressionAccording to observed data,establish regression equation and make statistic
2、al reference(predict).Chapter 10(P 227)Correlation and Regression Analysis1What does regression do?Solve the following problems:qDetermine whether there is statistical relationship among variables,if does,give the regression equation.qForecast the value of another variable(dependent)according to one
3、 variable or a group of variables(independent).2Example:X-price,Y-sales for a kind of productWe collect data:1.1.Scatter plot2.2.Regression equation(the Least Square Estimation)3.3.Correlation coefficient(Testing the regression model)4.4.Forecasting(point and interval forecasting)Simple Linear Regre
4、ssionX(Yuan)X(Yuan)707080809090100100110110Y(thousand)Y(thousand)11.2511.2511.2811.2811.6511.6511.7011.7012.1412.143Linear Regression ModelVariables consist of a linear function.YXiii 01SlopeY-InterceptIndependent(Explanatory)VariableDependent(Response)Variable Random Error4Sample Linear Regression
5、Modelei=random errorYXYbb Xeiii01Ybb Xii01Sampled Observed Value5Sample Linear Regression ModelThe least squares method provides an estimated regression equation that minimizes the sum of squared deviations between the observed values of the dependent variable yi and the estimated values of the depe
6、ndent variable .6Least Squares estimatione2YXe1e3e4Ybb Xeiii01Ybb Xii01OLS Min eeeeeii2112223242Predicted Value7Coefficient&EquationYbXbX YnXYXn XbYb Xiiiiiniin011122101Sample regression equationSlope for the estimated regression equationP 238 (10.17)Intercept for the estimated regression equationb8
7、Evaluating the Modelq Significance Testq Test Coefficient of Determination and Standard Deviation of Estimationq Residual AnalysisY b bXii 019Measures of Variation in Regression SST=SSR+SSE 1.Total Sum of Squares(SST)P 239(10.20)Measure the variation between the observed value Yi and the mean Y.2.Su
8、m of Squares due to Regression(SSR)Variation caused by the relationship between X and Y.3.Sum of Squares due to Error(SSE)Variation caused by other factors.10Variation MeasuresYX YXiSST (Yi-Y)2 SSE (Yi-Yi)2 SSR(Yi-Y)2 Yi Ybb Xii0111Coefficient of Determination 0 r2 1rbYbX Yn YYn Yiiiininiin201211212
9、Explained variation Total variationSSRSSTA measure of the goodness of fit of the estimated regression equation.It can be interpreted as the proportion of the variation in the dependent variable y that is explained by the estimated regression equation.12Correlation CoefficientA numerical measure of l
10、inear association between two variables that takes values between 1 and+1.Values near+1 indicate a strong positivelinear relationship,values near 1 indicate a strong negative linear relationship,and values near zero indicate lack of a linear relationship.n12n12n1iii)yy()xx(yxii)y)(x(r n1iiyn1y n1iix
11、n1x13Coefficients of Determination(r2)and Correlation(r)r2=1,r2=0,YYi=b0+b1XiXYYi=b0+b1XiXYYi=b0+b1XiXYYi=b0+b1XiXr=+1r=-1r=+0.9r=014Test of Slope Coefficient for Significance1.Tests a Linear Relationship Between X&Y 2.Hypotheses H0:1=0(No Linear Relationship)H1:1 0(Linear Relationship)3.Test Statis
12、ticniXniXYXSbSbSbnt12)(21where112115Example Test of Slope CoefficientH0:1=0H1:1 0 .05df 5-2 =3Critical value:Statistic:Determine:Conclusion:tbSb 1110700019153655.Reject at =0.05There is evidence of a relationship.t0 3.1824-3.1824.025RejectReject.02516Multiple Regression ModelThere exists linear rela
13、tionship among an dependent variable and two or more than two independent variables.YXXXiiiPPii01122slope of populationintercept of population Yrandom errorDependent VariableIndependent Variables17Example:New York Times You work in the advertisement department of New York Times(NYT).You will find to
14、 what extent do ads size(square inch)and publishing volume(thousand)influence the response to ads(hundred).You have collected the following data:response size volume112488131357264410618Example(NYT)Computer Output Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Pro
15、b|T|INTERCEP 1 0.0640 0.2599 0.246 0.8214ADSIZE 1 0.2049 0.0588 3.656 0.0399CIRC 1 0.2805 0.0686 4.089 0.0264 b2b0bPb119Interpretation of Coefficients 1.Slope(b1)If the publishing volume remains unchanged,when ads sizeincreases one square inch,the response is expected to increase 0.2049 hundred time
16、s.2.Slope(b2)If ads size remains unchanged,when publishing volume increases one thousand,the response is expected to in-crease 0.2805 hundred times.20Evaluating the Model1.How does the model describe the relationship among variables?2.Closeness of Best Fit3.Assumptions met4.Significance of estimates
17、5.Correlation among variables6.Outliers(unusual observations)21Testing Overall Significance1.Test whether there is linear relationship between Y and all the independent variables.2.2.Use F statistic.3.Hypothesis4.H0:1=2=.=P=0 5.There is no linear relationship between Y and independent variables.H1:A
18、t least there is a coefficient isnt equal to 0.At least there is an independent variable influences Y22Testing Overall Significance Computer OutputAnalysis of Variance Sum of Mean Source DF Squares Square F Value ProbFModel 2 9.2497 4.6249 55.440 0.0043Error 3 0.2503 0.0834C Total 5 9.5000Pn-P-1n-1MSR/MSEp Value23Transformations in Regression ModelsqNon-linear models that can be transformed into linear models(convenient to carry out OLS).qData TransformationqMultiplicative Model ExampleYXXYXXiiiiiiii0120112212lnlnlnlnln24Square-Root TransformationYXXiiii011221 01 01 01 0YX127