1、 第9章 回 归 分 析 Regression目目 录录 线 性 回 归 曲 线 估 计 二项逻辑斯谛回归 多分变量的逻辑斯谛回归 概率单位回归 非线性回归 加 权 回 归 两段最小二乘法 最优尺度回归 习题参考答案线 性 回 归 返回一元线性回归有关公式一元线性回归有关公式bSEbt aSEat 对斜率检验的假设是,总体回归系数对斜率检验的假设是,总体回归系数b=0。检验该假设的。检验该假设的t值计算公式是:值计算公式是:对截距检验的假设是,总体回归方程截距对截距检验的假设是,总体回归方程截距a=0。检验该假设的。检验该假设的t值计算公式是:值计算公式是:在两公式中,在两公式中,SEb是回归
2、系数的标准误。是回归系数的标准误。SEa是截距的标准误。是截距的标准误。一元线性回归有关公式一元线性回归有关公式 R R2 2判定系数判定系数 22)()(2yyyyRii方差分析方差分析 )1/()(/)(22pnyypyyF残差均方回归均方 (a)(b)(c)(d)(e)(f)(g)一元线性回归一元线性回归各种残差与预测值关系示意图各种残差与预测值关系示意图 返回多元线性回归的概念多元线性回归的概念多元回归分析的模型多元回归分析的模型 nnxbxbxbby22110校正校正R R2 2判定系数的公式判定系数的公式 1/)1/()(1 Adjusted222nyyknyyR其中其中k k为自
3、变量的个数,为自变量的个数,n n为观测量数目为观测量数目 偏回归系数和常数项的偏回归系数和常数项的t t检验的公式检验的公式 偏回归系数的标准误偏回归系数t常数项的标准误常数项t变非线性关系为线性关系变非线性关系为线性关系y1y+转换方法转换方法使用条件使用条件注注 释释Var(Var(e ei i)E()E(y yi i)因变量服从泊松分布因变量服从泊松分布 +Var(Var(e ei i)E()E(y yi i)某些因变量的值为某些因变量的值为0 0,或者很小,或者很小y变量转换公式表变量转换公式表 yarcsinLog Log y yVar(Var(e ei i)E()E(y yi i
4、)2 2,y y00因变量的值的范围很大因变量的值的范围很大Log(Log(y y+1)+1)Var(Var(e ei i)E()E(y yi i)2 2因变量的某些值为因变量的某些值为0 01/1/y yVar(Var(e ei i)E()E(y yi i)4 4因变量的值集中在因变量的值集中在0 0的附近,当自变量明的附近,当自变量明显降低时,因变量出现较大的值。例如:显降低时,因变量出现较大的值。例如:自变量是治疗某病的药剂量,因变量是反自变量是治疗某病的药剂量,因变量是反应时间。应时间。1/(1/(y y+1)+1)Var(Var(e ei i)E()E(y yi i)4 4某些自变量
5、为某些自变量为0 0的情况的情况Var(Var(e ei i)E()E(y yi i)(1)(1(y yi i)用于二项比例用于二项比例(0(0因变量因变量1)1)变非线性关系为线性关系变非线性关系为线性关系转换为线性的常用方法转换为线性的常用方法 变变 化化 方方 法法回回 归归 式式loglogy yloglogx xy y=aaloglogy yx xy y=aeaexxy yloglogx xy y=a a+loglogx x1/1/y y1/1/x xy y=x x/(/(axax+)1/1/y yx xy y=1/(=1/(a a+xx)y y1/1/x xy y=a a+(1/(
6、1/x x)回归菜单回归菜单线性回归主对话框线性回归主对话框 返回设定运算规则对话框设定运算规则对话框 返回输出统计量对话框输出统计量对话框 返回选择图形对话框选择图形对话框 返回保存变量对话框保存变量对话框 返回选择对话框选择对话框 返回简单散点图对话框图简单散点图对话框图 返回$0$20,000$40,000$60,000$80,000$100,000$120,000$140,000Current Salary$0$20,000$40,000$60,000$80,000Beginning Salary散点图示例散点图示例 返回初始工资与当前工资散点图初始工资与当前工资散点图回归模型的建立(
7、示例输出回归模型的建立(示例输出1)Variables Entered/RemovedaBeginningSalary.Stepwise(Criteria:Probability-of-F-to-enter=.100).EmploymentCategory.Stepwise(Criteria:Probability-of-F-to-enter=.100).PreviousExperience(months).Stepwise(Criteria:Probability-of-F-to-enter=.100).Months sinceHire.Stepwise(Criteria:Probabili
8、ty-of-F-to-enter=.100).EducationalLevel(years).Stepwise(Criteria:Probability-of-F-to-enter=.100).Model12345VariablesEnteredVariablesRemovedMethodDependent Variable:Current Salarya.返回引入或从模型中剔除的变量引入或从模型中剔除的变量回归模型的建立(示例输出回归模型的建立(示例输出2)Model Summaryf.880a.775.774$8,115.36.898b.806.805$7,540.43.909c.827.
9、826$7,127.04.914d.836.835$6,940.23.917e.840.839$6,856.79Model12345RR SquareAdjustedR SquareStd.Error ofthe EstimatePredictors:(Constant),Beginning Salarya.Predictors:(Constant),Beginning Salary,EmploymentCategoryb.Predictors:(Constant),Beginning Salary,EmploymentCategory,Previous Experience(months)c
10、.Predictors:(Constant),Beginning Salary,EmploymentCategory,Previous Experience(months),Months sinceHired.Predictors:(Constant),Beginning Salary,EmploymentCategory,Previous Experience(months),Months sinceHire,Educational Level(years)e.Dependent Variable:Current Salaryf.返回拟合过程小结拟合过程小结回归模型的建立(示例输出回归模型的
11、建立(示例输出3)ANOVAANOVAf f106831048750.1241106831048750.1241622.118.000a31085446686.21647265858997.217137916495436.340473111136313278.118255568156639.059977.312.000b26780182158.22147156858136.217137916495436.340473114042988034.361338014329344.787748.392.000c23873507401.97847050794696.600137916495436.340
12、473115326259146.015428831564786.504598.577.000d22590236290.32546948166815.118137916495436.340473115913177991.251523182635598.250493.084.000e22003317445.08946847015635.566137916495436.340473RegressionResidualTotalRegressionResidualTotalRegressionResidualTotalRegressionResidualTotalRegressionResidualT
13、otalModel12345Sum of SquaresdfMean SquareFSig.Predictors:(Constant),Beginning Salarya.Predictors:(Constant),Beginning Salary,Employment Categoryb.Predictors:(Constant),Beginning Salary,Employment Category,PreviousExperience(months)c.Predictors:(Constant),Beginning Salary,Employment Category,Previous
14、Experience(months),Months since Hired.Predictors:(Constant),Beginning Salary,Employment Category,PreviousExperience(months),Months since Hire,Educational Level(years)e.Dependent Variable:Current Salaryf.返回方差分析方差分析回归模型的建立(示例输出回归模型的建立(示例输出4)Excluded VariablesExcluded Variablese e.102a4.750.000.2141.00
15、01.0001.000-.137a-6.558.000-.289.9981.002.998.269a8.702.000.372.4302.323.430.172a6.356.000.281.5991.669.599.096b4.844.000.218.9991.001.430-.145b-7.565.000-.329.9961.004.430.157b6.202.000.275.5961.678.349.097c5.162.000.232.9991.001.429.102c3.856.000.175.5151.940.339.091d3.533.000.161.5121.953.337Mont
16、hs since HirePrevious Experience(months)Employment CategoryEducational Level(years)Months since HirePrevious Experience(months)Educational Level(years)Months since HireEducational Level(years)Educational Level(years)Model1234Beta IntSig.PartialCorrelationToleranceVIFMinimumToleranceCollinearity Stat
17、isticsPredictors in the Model:(Constant),Beginning Salarya.Predictors in the Model:(Constant),Beginning Salary,Employment Categoryb.Predictors in the Model:(Constant),Beginning Salary,Employment Category,Previous Experience(months)c.Predictors in the Model:(Constant),Beginning Salary,Employment Cate
18、gory,Previous Experience(months),Months sinceHired.Dependent Variable:Current Salarye.返回逐步回归不在方程中的变量逐步回归不在方程中的变量回归模型的建立(示例输出回归模型的建立(示例输出5)C Co oe ef ff fi ic ci ie en nt ts sa a1928.206888.6802.170.0311.909.047.88040.276.0001.0001.0001036.931832.0511.246.2131.469.067.67721.873.000.4302.3235947.00068
19、3.430.2698.702.000.4302.3233039.205829.7833.663.0001.467.063.67623.117.000.4302.3236160.294646.577.2799.528.000.4302.327-23.7493.139-.145-7.565.000.9961.004-10300.672707.813-3.804.0001.479.062.68223.911.000.4302.3266060.446629.927.2749.621.000.4292.330-23.7893.057-.146-7.781.000.9961.004163.82631.73
20、9.0975.162.000.9991.001-15038.572992.525-5.025.0001.365.069.62919.796.000.3372.9655859.585624.945.2659.376.000.4262.349-19.5533.250-.120-6.017.000.8601.162154.69831.464.0914.917.000.9921.008539.642152.735.0913.533.000.5121.953(Constant)Beginning Salary(Constant)Beginning SalaryEmployment Category(Co
21、nstant)Beginning SalaryEmployment CategoryPrevious Experience(months)(Constant)Beginning SalaryEmployment CategoryPrevious Experience(months)Months since Hire(Constant)Beginning SalaryEmployment CategoryPrevious Experience(months)Months since HireEducational Level(years)Model12345BStd.ErrorUnstandar
22、dizedCoefficientsBetaStandardizedCoefficientstSig.ToleranceVIFCollinearityStatisticsDependent Variable:Current Salarya.返回建立模型过程中的各模型回归系数及检验结果建立模型过程中的各模型回归系数及检验结果回归模型的建立(示例输出回归模型的建立(示例输出6)Casewise DiagnosticsCasewise Diagnosticsa a6.034$103,750$62,374.33$41,375.6713.483$110,625$86,742.22$23,882.7763.
23、450$97,000$73,344.88$23,655.1243.582$91,250$66,687.60$24,562.396-3.486$66,750$90,654.61-$23,904.6156.936$80,000$32,441.54$47,558.4634.505$83,750$52,858.98$30,891.0173.049$100,000$79,097.06$20,902.9453.713$90,625$65,166.38$25,458.622Case Number1832103106205218274446454Std.Residual Current SalaryPredi
24、ctedValueResidualDependent Variable:Current Salarya.返回当前工资变量的异常值表当前工资变量的异常值表回归模型的建立(示例输出回归模型的建立(示例输出7)Residuals Statisticsa$13,966.34$132,960.17$34,419.57$15,654.38474-1.3076.295.0001.000474$372.61$3,453.16$724.93$264.12474$13,892.49$132,267.02$34,416.16$15,657.16474-$23,904.62$47,558.46$.00$6,820.4
25、6474-3.4866.936.000.995474-3.6116.952.0001.003474-$25,640.79$47,782.29$3.41$6,940.12474-3.6587.334.0021.017474.399118.9664.9896.762474.000.158.003.012474.001.252.011.014474Predicted ValueStd.Predicted ValueStandard Error ofPredicted ValueAdjusted Predicted ValueResidualStd.ResidualStud.ResidualDelet
26、ed ResidualStud.Deleted ResidualMahal.DistanceCooks DistanceCentered Leverage ValueMinimumMaximumMeanStd.DeviationNDependent Variable:Current Salarya.返回残差分析的统计量残差分析的统计量回归模型的建立(示例输出回归模型的建立(示例输出8)返回判定影响点的统计量判定影响点的统计量 返回标准化回归系数的变化量标准化回归系数的变化量回归模型的建立(示例输出回归模型的建立(示例输出9)回归模型的建立(示例输出回归模型的建立(示例输出10)Collinea
27、rity DiagnosticsCollinearity Diagnosticsa a1.9081.000.05.05.0924.548.95.952.8231.000.02.01.01.1304.662.86.03.23.0487.699.12.96.763.3461.000.01.01.01.03.4842.629.01.01.02.89.1235.220.85.02.22.07.0478.395.13.96.75.004.2631.000.00.00.00.02.00.4912.946.00.01.01.93.00.1904.739.02.05.17.05.02.0499.371.00.
28、92.81.00.00.00724.026.98.02.00.00.975.2111.000.00.00.00.01.00.00.5163.179.00.00.01.80.00.00.1995.112.01.05.19.02.02.00.05010.236.00.63.79.01.01.01.01817.249.01.31.00.14.18.86.00727.634.97.00.00.03.79.12Dimension12123123412345123456Model12345EigenvalueConditionIndex(Constant)BeginningSalaryEmployment
29、CategoryPreviousExperience(months)Monthssince HireEducationalLevel(years)Variance ProportionsDependent Variable:Current Salarya.返回共线性诊断共线性诊断当前工资的预测值与其学生式残差散点图当前工资的预测值与其学生式残差散点图 返回曲 线 估 计 返回曲线估计对话框 返回各种曲线回归模型各种曲线回归模型模型名称模型名称回归方程回归方程相应的线性回归方程相应的线性回归方程LinearLineary y=b=b0 0+b+b1 1t tQuadratic Quadratic
30、 y y=b=b0 0+b+b1 1t t+b+b2 2t t2 2Compound Compound y y=b=b0 0(b(b1 1 t t)ln(ln(y y)=ln(b)=ln(b0 0)+ln(b)+ln(b1 1)t tGrowth Growth ln(ln(y y)=b)=b0 0+b+b1 1t tLogarithmicLogarithmicy y=b=b0 0+b+b1 1ln(ln(t t)Cubic Cubic y y=b=b0 0+b+b1 1t t+b+b2 2t t2 2+b+b3 3t t3 3S Sln(ln(y y)=b)=b0 0+b+b1 1/t tEx
31、ponential Exponential ln(ln(y y)=ln(b)=ln(b0 0)+b)+b1 1t tInverse Inverse y y=b=b0 0+(b+(b1 1/t t)Power Power y y=b=b0 0(t tb1b1)ln(ln(y y)=ln(b)=ln(b0 0)+b)+b1 1ln(ln(t t)Logistic Logistic y y=1/(1/u+b=1/(1/u+b0 0(b(b1 1t t)ln(1/ln(1/y y-1/u)=ln(b1/u)=ln(b0 0+ln(b+ln(b1 1)t t)返回保存对话框保存对话框 返回曲线回归实例散
32、点图曲线回归实例散点图 返回每加仑里程与车重散点图每加仑里程与车重散点图曲线回归实例输出曲线回归实例输出1M Mo od de el l S Su um mm ma ar ry y.810.656.6554.593RR SquareAdjustedR SquareStd.Error ofthe EstimateThe independent variable is Vehicle Weight(lbs.).A AN NO OV VA A15918.13027959.065377.209.0008334.44539521.10024252.575397RegressionResidualTota
33、lSum ofSquaresdfMean SquareFSig.The independent variable is Vehicle Weight(lbs.).CoefficientsCoefficients-.012.002-1.330-6.094.0007.60E-007.000.5282.419.01652.5403.03017.337.000Vehicle Weight(lbs.)Vehicle Weight(lbs.)*2(Constant)BStd.ErrorUnstandardizedCoefficientsBetaStandardizedCoefficientstSig.Qu
34、adraticQuadratic模型拟合模型拟合系数及其检验结果系数及其检验结果模型的方差分析结果模型的方差分析结果拟合优度的检验拟合优度的检验二次模型分析结果二次模型分析结果曲线回归实例输出曲线回归实例输出2 返回三次模型分析结果三次模型分析结果Model SummaryModel Summary.828.686.6834.399RR SquareAdjustedR SquareStd.Error ofthe EstimateThe independent variable is Vehicle Weight(lbs.).A AN NO OV VA A16629.06335543.02128
35、6.476.0007623.51339419.34924252.575397RegressionResidualTotalSum ofSquaresdfMean SquareFSig.The independent variable is Vehicle Weight(lbs.).CoefficientsCoefficients.033.0083.5984.286.000-1.4E-005.000-9.968-5.715.0001.59E-009.0005.655.9.5557.6621.247.213Vehicle Weight(lbs.)Vehicle Weight(lbs.)*2Vehi
36、cle Weight(lbs.)*3(Constant)BStd.ErrorUnstandardizedCoefficientsBetaStandardizedCoefficientstSig.CUBIC模型拟合模型拟合系数及其检验结果系数及其检验结果模型的方差分析结果模型的方差分析结果拟合优度的检验拟合优度的检验曲线回归实例输出曲线回归实例输出3 返回 指数模型分析结果指数模型分析结果Compound模型拟合模型拟合系数及其检验结果系数及其检验结果模型的方差分析结果模型的方差分析结果拟合优度的检验拟合优度的检验曲线回归实例输出曲线回归实例输出4 返回三种模型的图形比较三种模型的图形比较二项逻
37、辑斯谛回归二项逻辑斯谛回归 返回有关二项逻辑斯谛回归的公式有关二项逻辑斯谛回归的公式Logistic回归曲线 返回指示变量编码方法 返回背离编码方法 返回二项逻辑斯谛过程对话框 返回设定选择规则对话框 返回定义分类变量对话框 返回保存新变量对话框 返回选择对话框 返回观测量简表 Case Processing SummaryCase Processing Summary87472.433327.61207100.00.01207100.0Unweighted CasesaIncluded in AnalysisMissing CasesTotalSelected CasesUnselected
38、 CasesTotalNPercentIf weight is in effect,see classification table forthe total number of cases.a.返回 因变量代码表 Dependent Variable EncodingDependent Variable Encoding01Original ValueNoYesInternalValue 返回分类变量代码表 Categorical Variables CodingsCategorical Variables Codings77.000.0004841.000.000313.0001.0006
39、25.000.0002401.000.0009.0001.000123Histologic Grade 5 cmPathological Tumor Size(Categories)Frequency(1)(2)Parameter coding 返回因变量分类表 Classification TableClassification Tablea,ba,b6550100.02190.074.9ObservedNoYesLymph Nodes?Overall PercentageStep 0NoYesLymph Nodes?PercentageCorrectPredictedConstant is
40、 included in the model.a.The cut value is.500b.返回起始模型统计量表 V Va ar ri ia ab bl le es s i in n t th he e E Eq qu ua at ti io on n-1.096.078 196.9921.000.334ConstantStep 0BS.E.WalddfSig.Exp(B)返回起始模型外的变量 Variables not in the EquationVariables not in the Equation15.1562.00112.7011.0009.7601.00228.5261.00
41、025.2081.000.1461.70315.2722.0009.6741.0021.3061.25350.7907.000PATHSCATPATHSCAT(1)PATHSCAT(2)PATHSIZEAGETIMEHISTGRADHISTGRAD(1)HISTGRAD(2)VariablesOverall StatisticsStep0ScoredfSig.返回起始模型卡方检验表 Omnibus Tests of Model CoefficientsOmnibus Tests of Model Coefficients51.7287.00051.7287.00051.7287.000Step
42、BlockModelStep 1Chi-squaredfSig.返回最终模型的拟合优度检验 M Mo od de el l S Su um mm ma ar ry y932.331.057.085Step1-2 LoglikelihoodCox&SnellR SquareNagelkerke RSquare 返回依据预测概率的观测量分组表 返回正态概率分布图与反正态概率图 返回 不同类型的散点图 返回9.4 多分变量的逻辑斯谛回归 返回配对变量之间的差异 返回种族的编码方式 返回逻辑斯谛回归对话框 返回Reference Category对话框 返回SAVE对话框 返回标准对话框 返回模型对话
43、框 返回Statistics对话框 返回模型常用统计量 Case Processing SummaryCase Processing Summary66135.8%27815.1%90849.2%80443.5%104356.5%1847100.0%018472BushPerotClintonVOTE FOR CLINTON,BUSH,PEROTmalefemaleRESPONDENTS SEXValidMissingTotalSubpopulationNMarginalPercentageModel Fitting InformationModel Fitting Information61
44、.20927.34333.8662.000ModelIntercept OnlyFinal-2 LogLikelihoodChi-SquaredfSig.L Li ik ke el li ih ho oo od d R Ra at ti io o T Te es st ts s27.343a.0000.61.20933.8662.000EffectInterceptsex-2 LogLikelihood ofReducedModelChi-SquaredfSig.The chi-square statistic is the difference in-2 log-likelihoodsbet
45、ween the final model and a reduced model.The reducedmodel is formed by omitting an effect from the final model.Thenull hypothesis is that all parameters of that effect are 0.This reduced model is equivalent to the final modelbecause omitting the effect does not increase the degreesof freedom.a.P Pa
46、ar ra am me et te er r E Es st ti im ma at te es s-.501.06854.0671.000.433.10417.4221.0001.5431.2581.8910b.0.-1.511.098235.7031.000.715.13926.5721.0002.0441.5582.6820b.0.Interceptsex=1sex=2Interceptsex=1sex=2VOTE FOR CLINTON,BUSH,PEROTaBushPerotBStd.ErrorWalddfSig.Exp(B)Lower BoundUpper Bound95%Conf
47、idence Interval forExp(B)The reference category is:Clinton.a.This parameter is set to zero because it is redundant.b.返回增加变量educ作为协变量(Covariates)后的参数根据统计量 Parameter EstimatesParameter Estimates-.702.2597.3181.007.015.018.6561.4181.015.9791.051.428.10416.9701.0001.5351.2521.8810b.0.-1.894.35328.8591.0
48、00.027.0241.2481.2641.028.9801.078.715.13926.3961.0002.0431.5562.6840b.0.Intercepteducsex=1sex=2Intercepteducsex=1sex=2VOTE FOR CLINTON,BUSH,PEROTaBushPerotBStd.ErrorWalddfSig.Exp(B)Lower BoundUpper Bound95%Confidence Interval forExp(B)The reference category is:Clinton.a.This parameter is set to zer
49、o because it is redundant.b.返回加入学历后的统计量表 Parameter EstimatesParameter Estimates-.760.7381.0611.303-.002.039.0041.951.998.9241.077.457.10519.0281.0001.5801.2861.9410b.0.-.222.425.2731.602.801.3481.843.376.2811.7811.1821.456.8392.528.422.2952.0421.1531.525.8552.719.419.2113.9631.0471.5211.0072.2980b.0
50、.-2.8461.0417.4691.006.035.054.4201.5171.036.9311.153.764.14129.3841.0002.1461.6282.8290b.0.-.267.640.1741.677.766.2182.6851.036.4096.4161.0112.8181.2646.2831.193.4088.5601.0033.2971.4837.332.879.3137.8691.0052.4081.3034.4510b.0.Intercepteducsex=1sex=2degree=0degree=1degree=2degree=3degree=4Intercep