1、F,MFWilliam of Occham(12851348)from wikipediaOccams razor:Entia non sunt multiplicanda praeter necessitatem Or:Entities should not be multiplied unnecessarily the explanation of any phenomenon should make as few assumptions as possible,eliminating,or shaving off,those that make no difference in the
2、observable predictions of the explanatory hypothesis or theory.2YX20,1,0,0.3XUniformN样本数n=10用M阶多项式拟合:1Mjjjyw x0阶多项式拟合1阶多项式拟合3阶多项式拟合9阶多项式拟合211inRMSiiEyyn9阶多项式拟合,训练样本数n=159阶多项式拟合,训练样本数n=100岭回归:最小化 22101ppnridgeiijjjijjRSSyX ww岭回归岭回归教材第8章目标:选择使测试误差最小的模型M,称为模型选择。11,ntriiiRML Y Y Mn,R ML Y Y M EtrR MRMop
3、 M训练误差的乐观性trR MRMop M训练误差的乐观性欠拟合程度+复杂性惩罚 trR MRMop Mop MtrRMR M22ptrpCMRMn2 MSE噪声方差的估计,通过一个低偏差模型的估计p基的数目n训练样本数目使用所有特征的模型pCM2R Ml M E1loglog|,niiil ML Mf YMX122log|,ntrtriiiRMlMf Y MX 1log|,niiif YMX,M FtrlMi为测试集上数据索引n 22trR MlMp 22222trtrnpAIC MlMpRMn 222112ntriiilMyfx(高斯模型时,对数似然与平方误差损失一致)211ntriiiR
4、Myfxn2trnRM()2logtrBIC MlMn p 22112ntriiilMyfx2trnRM同AIC22()logtrnpBIC MRMnnF()()()()(|,)|(|,)|,(|)(|,)|ffffffffdqqqqqqqq=ZZZZZMMMMMMMMv11,.,nnX YXYZ()|fqM(|,)fqZM(|)fM|Z(|,)fZMM111222(|)()(|)(|)()(|)ffffffZZZZMMMMMM()mf M(|)mf Z M,1mmM M(|)()(|)mmmfffZZMMMm12()()ffMM 12(|)(|)fBFfZZZMM111222(|)()(|)
5、(|)()(|)ffffffZZZZMMMMMMZ2(|)f Z M1(|)f Z M2M1M2MC11M(|)(|,)(|)mmmmmmfffdZZMMMm|(|)(|,)(|)mmmmmZbest fit likelihoodOccam factorfffZZMMM (|,)(|)mmmmffZMM|Z1 2 (|)(|,)(|)det2mmmmmbest fit likelihoodOccam factorfffZZAMMM log(|,)mmfAZ M log(|)log(|,)log12mmmmdffnOZZMM2log(|,)mmf ZMf(|)mf Z Mmmd22()logtr
6、npBIC MRMnn12121|mlBICmBICMlefeZMRissanen,J.1978.Modeling by shortest data description.Automatica,14,465-471.()22,()logil ziiizl zz PP2log()izP 2l(g)oiiilzzz PPE il z熵:消息长度的下界 izP izPlog(,)log()yMMX 长度PP传递模型参数所需的平均消息长度用于传递模型与目标差别所需要的平均消息长度22222trtrnpAIC MlMpRMn l Md Ml M22l Md Md Mp n22()2loglogtrtr
7、npBIC MlMn pRMnn 22()logtrnpBIC MRMnn222trnpAIC MRMnBIC MAIC M ySySnnixiy dtraceSS 2121pjTTjjddftracedX X XIX1ridgeTTyXX X XIX y:0,1DFR2D线性函数的VC维为3,等于参数的个数正弦函数的VC维:无穷,但参数只有一个:频率sin()x:DFRR:0IfxFtrtr4112RMR MRM21log1log(4)hc n hcntr31RMR Mc1231ccc124,2cc1trtr4112RMR MRM21log1log(4)hc n hcntrR MRMh n(
8、,()R ME L Yf X)(XfKk1KkK第1折:第2折:第3折:第4折:第5折:()11()(,(,).nk iiiiCVL yfxn),(xf)(CV)(CVargmin()CV()(,)k iifx由于训练集减小,会引起偏差,Kn,5K,1,.,iiZx yin*,1,.,1,.,biiZxyinbB*111,.|inloobootibiiib CRL yfxnC*111 1,.BnbootibibiRL yfxB niC.6320.3680.632.trloobootRRR.6320.6321,10.368trloobootRw RwRwR2111,nniiiiL yf xnlo
9、oboottrtrRRRR1TTySyX X XIX y 2121pjTTjjddftracedX X XIX22trpAIC MRMn2logtrpBIC MRMnntrlog1log(4),1hn hRMR Mn*41.7532,4.0366df最小测试误差,MF0Yf kYfkFkk()()()12,pFFkFkFk=K()12,kpfYkkkkqK0Yf0Yf0Yf()*f Yq ,log|g yL g fg y dyfyf loglog|g yg y dxfyg y dy loglog|YYg yfyE EE E对 而言是常数C|fx2log|Yfy E E熵定义为:logH Yg
10、yg y dy KL散度也表示用f去近似g,信息的损失量,2log|nYL g ffxY E EnY极大似然等价于最小KL散度,参见MLE的性质部分,2log|nnnYYYL g ffyY EEEEEElog|nnYYfyYE EE Elog似然1,nnYYYlog|nnYYfyYE EE E 1*log|YfyYtr JIE E2*log|,1,YijfyIi jp E E*log|log|TYfyfyJE E*n ,为Fisher信息fg 11*JIJItr JIpI 1*tr JIpmin2log|nnYYfyYEEEE min2log|2yfyypE E log|2log|2nnYYYfyYfyYp E EE EE E 1log|log|nYiifyYfyYE E 2212log|22nitrinpAIC MfyYpRMn 1,.nYYY