1、 关于机器学习的若干理论问题关于机器学习的若干理论问题徐宗本(西安交通大学)Email: 主页:http:/纲纲 要要l线性学习机的万能性理论线性学习机的万能性理论l基于误差建模的正则化理论基于误差建模的正则化理论l稀疏信息处理的新模型与新理论稀疏信息处理的新模型与新理论l线性学习机的万能性理论线性学习机的万能性理论A New Learning Paradigm:LtDAHP(Learning through Deterministic Assignment of Hidden Parameters)Zongben Xu(Xian Jiaotong University,Xian,China)
2、Email: Homepage:http:/lA supervised learning problem:difficult or easy?lCan a difficult learning problem be solved more simply?lIs a linear machine universal?OutlinelSome Related ConceptslLtRAHP:Learning through Random Assignment of Hidden ParameterslLtDAHP:Learning through Deterministic Assignment
3、of Hidden ParameterslConcluding RemarksOutlinelSome Related ConceptslLtRAHP:Learning through Random Assignment of Hidden ParameterslLtDAHP:Learning through Deterministic Assignment of Hidden ParameterslConcluding RemarksSupervised Learning:Given a finite number of input/output samples,to find a func
4、tion f in a machine H that approximates the unknown relation between the input and output spaces.1x2xmx1y2ymySome Related Concepts:Supervised LearningBlack boxFace RecognitionSocial NetworkStock Index Tracking11(,);:miiiDx yfH l HDR*11argmin()(,)mempiifHifEfl f x ymERMdixRMachine:FNNs:1,:,1,2,.,Niii
5、iiHaT xTiN211(,)()()NTFNNiiiifa WxaW x,j kW,i j(1)x(2)x()x m(1)y()y m(2)y12121NNHidden Parameter:Determine the hidden predictors(non-linear mechanism).Bright Parameter:Determine how the hidden predictors are linearly combined(linear mechanism)Some Related Concepts:HP vs BP 1,:,1,2,.,NiiiiiHxTiTNa211
6、(,)()()TiiNF NiNiaafWxxWBright parameterHidden parameterBright parameterHidden parameterHidden parameterHidden parameterBright parameter,j kW,i j(1)x(2)x()x m(1)y()y m(2)y12121NNOne-Stage Learning:HPs and BPs are trained simultaneously in one stage.Two-Stage Learning:HPs and BPs are trained separate
7、ly in two stages.Machine:1,:,1,2,.,NiiiiiHxTiTNaSome Related Concepts:OSL vs TSL 111argmin,imNjjiijjaRilaxyTa 12assign(),.,NTTT TT,TSLStage 1:Stage 2:TaBright Bright parameterparameterHidden Hidden parameterparameter1,11(,)argmin,ijmNjjjiiaR TijlaT xyTa OSL Q1:How to specify assign function?11argmin
8、,jmNjjiiTijjlaT xy ADMLtRAHPLtDAHP Q2:Can TSL work?Some Related Concepts:Main ConcernslT=assign(a)=lT=assign()=random assignmentlT=assign(n)=deterministic assignmentlUniversal approximation?lDoes it degrade the generalization ability?lConsistency/Convergence?lEffectiveness&Efficiency?OutlinelSome Re
9、lated ConceptslLtRAHP:Learning through Random Assignment of Hidden ParameterslLtDAHP:Learning through Deterministic Assignment of Hidden ParameterslConcluding RemarksLtRAHP:An OverviewLtRAHP TypicalsRandom vector functional-link networks(RVFLs)Echo-state neural networks(ESNs)Extreme learning machine
10、(ELM)(Y.H.Pao,Adaptive Pattern Recognition and Neural Networks,Reading,MA:Addison-Wesley,1989)(H.Jaeger and H.Haas.Harnessing nonlinearity:Predicting chaotics systems and saving energy in wireless communication.Science,304:78-80,2004.)(G.B.Huang,Q.Y.Zhu and C.K.Siew.Extreme learning machine:Theory a
11、nd applications.Neurocomputing,70:489-501,2006.)5L,j kWia(1)x(2)x(5)xyRandom assignmentStage 1:Stage 2:LtRAHP Training(,)assign()W111arg min,jmNjjiiaRjijjWlaxya121NNLtRAHP:Experimental EvidencesTestRMSE of UCI dataTraining timeFace Recoginition Marques et al.2012Handwritten Character Recognition Cha
12、cko et al.2012Object Recognition Xu et al.2012Experimental Support Huang et al.2006Application SupportData setsBPSVMELMTrianzines0.21970.12890.2002Housing0.12850.11800.1267Abalone0.08740.07840.0824Airelone0.04810.04290.0431Census0.06850.07460.0660Data setsBPSVMELMTrianzines0.54840.0086=d-1)Configura
13、tion Problem can be approximately solved by:(log)nnLtDAHP:Mathematical Foundations(II)lEqual-area partition(EAP)lRecursive zonal sphere partition(RZSP)http:/ Complexity:LtDAHP:FNN InstanceArchitecture of FNN1jNFNNjjjjfaxW11jknlkkFNNjjkfxa*11*nlFNNjjjkkkfxaConventional FNNsLtDAHP based FNNs *1*2()/kj
14、kkjkkkkxxCxxCx Architecture of LtDAHPdjWB11,kdjkSSNnl,/dlNnN l Stage 1:Stage 2:assign()WN1,.,:nWW1,.,:lMinimal Riesz(d-1)-energy points on Sd-1(EZSP)Best packing points on S1 LtDAHP:Learning procedure(FNN instance)Architecture of for LtDAHP11211122*,arg minarg min|()jjmnlikjjkaRijkiaRniNkkmjWWyaxHYH
15、 YHxaLtDAHP AlgorithmGeneralization CapabilityLtDAHP:Theoretical assessment(FNN instance)LtDAHP:If ,OSL 2rfW 1(2)drlm(1)(2)ddrnm2222212sup(|()|)logmXrrrrdrdMLDHPfWC mEffC mm 2222212sup(|()|)logmXrrrrdrdMOSLfWC mEffC mm Generalization CapabilityLtDAHP:Theoretical assessment(FNN instance)LtDAHP:If ,EL
16、M:If T is randomly fixed according to (2),ddrnm 2222212sup(|()|)lognmXrrrr dr dMELMfWC mEEffC mm 2rfW 1(2)drlm(1)(2)ddrnm2222212sup(|()|)logmXrrrrdrdMLDHPfWC mEffC mm Multiple times of trials are requiredMultiple times of trials are requiredNumber of hidden nodes(N)Number of samples(m)Number of samp
17、les(m)Number of hidden nodes(N)LtDAHP:Toy simulations(FNN instance)ELM(LtRAHP)LtDAHPTraining timeTest errorNumber of hidden nodes(N)Number of samples(m)Number of samples(m)Number of hidden nodes(N)LtDAHP:Simulations on UCI data setsData setsTraining samplesTesting samplesAttributesAuto_Price1065315S
18、tock6333179Bank(Bank8FM)299915008Delta_ailerons356535645Delta_Elevators475947586Data setsTestRMSETrainMTMSparsitySVMELMLtDAHPSVMELMLtDAHPSVMELMLtDAHPAuto_price0.04270.03240.03571603.223.22116.2240.172.2Stock0.04780.03470.03065.640.3250.32526.7108.1148.3Bank8FM0.04540.04460.042182.11.421.42112.988.46
19、0.5Delta_airelons0.04220.03870.039960.12.322.32169.356.248.1Delta_Elevators0.05340.05350.05376843.103.10597.652.652.1LtDAHP:Real world data experimentsMethodsTestRMSETrainMTMsparsityELM10.891989601LtDAHP9.211989512Million Song Dataset(Bertin et al.,2011)describes a learning task of predicting the ye
20、ar in which a song is released based on audio features associated with the song.The dataset consists of 463,715 training examples and 51,630 testing examples with d=90.Each example is a song released between 1922 and 2011,and the song is represented as a vector of timbre information computed about t
21、he song.Million song datasetMethodsTestRMSETrainMTMsparsityELM0.00371523534LtDAHP0.00171523186Buzz Prediction dataset is collected fromTwitter,a famous social network and a micro-blogging platform with exponential growth and extremely fast dynamics.The task is to predict the mean number of active di
22、scussion(NAD)from d=77 primary features,including number of created discussions,average numberof author interaction,average discussion length,and etc.The dataset contains m=583,250 samples,and so a real large scale problem.Buzz in Social MediaLtDAHP:Real world data experimentsConcluding RemarkslLtDA
23、HP provides a very efficient way of overcoming both high computation burden of OSL and the uncertainty difficulty in LtRAHP.lLtDAHP establishes a new paradigm in which supervised learning problems can be very simply but still effectively solved by preassigning the hidden parameters and solving the bright parameters only,while not sacrificing the generalization capability.lMany problems are still open on LtDAHP.Deserve further study.Thank You!