1、INSTITUTE OF COMPUTING TECHNOLOGYINSTITUTE OF COMPUTING TECHNOLOGYINSTITUTE OF COMPUTING TECHNOLOGY迁移学习迁移学习算法研究算法研究庄福振庄福振中国科学院计算技术研究所中国科学院计算技术研究所2019 年年 4 月月 18 日日INSTITUTE OF COMPUTING TECHNOLOGYTrainingDataOccPalm LinesDragonStarFortune?ProflongTgoodLawyershortFbadPhD StubrokenTgoodDoclongFbadClas
2、sifierUnseen Data(,long,T)good!What if2传统监督机器学习传统监督机器学习(1/2)(1/2)2022-7-28from Prof.Qiang YangINSTITUTE OF COMPUTING TECHNOLOGY传统监督机器学习传统监督机器学习(2/2)(2/2)32022-7-28l传统监督学习在实际应用中在实际应用中通常不能满足!通常不能满足!训练集测试集分类器训练集测试集分类器INSTITUTE OF COMPUTING TECHNOLOGY迁移学习迁移学习42022-7-28l实际应用学习场景HP 新闻新闻Lenovo 新闻新闻迁移迁移学习学习
3、 运用已有的知识对运用已有的知识对不同但相关领域不同但相关领域问题问题进行求解的一种新的机器学习方法进行求解的一种新的机器学习方法 放宽了传统机器学习的两个基本假设放宽了传统机器学习的两个基本假设INSTITUTE OF COMPUTING TECHNOLOGY迁移学习场景迁移学习场景(1/4)(1/4)52022-7-28l迁移学习场景无处不在迁移迁移知识知识迁移迁移知识知识图像分类图像分类HP 新闻新闻Lenovo 新闻新闻新闻网页分类新闻网页分类INSTITUTE OF COMPUTING TECHNOLOGY异构特征空间6The apple is the pomaceous fruit
4、 of the apple tree,species Malus domestica in the rose family Rosaceae.Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit.Training:TextFuture:ImagesApplesBananas迁移学习场景迁移学习场景(2/4)(2/4)2022-7-28from Prof.Qiang YangXin
5、Jin,Fuzhen Zhuang,Sinno Jialin Pan,Changying Du,Ping Luo,Qing He:Heterogeneous Multi-task Semantic Feature Learning for Classification.CIKM 2019:1847-1850.INSTITUTE OF COMPUTING TECHNOLOGY Test Test Training TrainingClassifierClassifier72.65%DVDElectronicsElectronics84.60%ElectronicsDrop!迁移学习场景迁移学习场
6、景(3/4)(3/4)72022-7-28from Prof.Qiang YangINSTITUTE OF COMPUTING TECHNOLOGY8DVDElectronicsBookKitchenClothesVideo gameFruitHotelTeaImpractical!迁移学习场景迁移学习场景(4/4)(4/4)2022-7-28from Prof.Qiang YangINSTITUTE OF COMPUTING TECHNOLOGYOutlinepConcept Learning for Transfer Learning Concept Learning based on N
7、on-negative Matrix Tri-factorization for Transfer Learning Concept Learning based on Probabilistic Latent Semantic Analysis for Transfer LearningpTransfer Learning using Auto-encodersTransfer Learning from Multiple Sources with Autoencoder RegularizationSupervised Representation Learning:Transfer Le
8、arning with Deep Auto-encoders92022-7-28INSTITUTE OF COMPUTING TECHNOLOGYConcept Learning based on Non-negative Matrix Tri-factorization for Transfer LearningConcept Learning for Transfer Learning102022-7-28INSTITUTE OF COMPUTING TECHNOLOGYIntroduction Many traditional learning techniques work well
9、only under the assumption:Training and test data follow the same distribution Training(labeled)ClassifierTest(unlabeled)Enterprise News Classification:including the classes“Product Announcement”,“Business scandal”,“Acquisition”,Product announcement:HPs just-released LaserJet Pro P1100 printer and th
10、e LaserJet Pro M1130 and M1210 multifunction printers,price performance.Announcement for Lenovo ThinkPad ThinkCentre price$150 off Lenovo K300 desktop using coupon code.Lenovo ThinkPad ThinkCentre price$200 off Lenovo IdeaPad U450p laptop using.their performanceHP newsLenovo newsDifferent distributi
11、onFail!11Concept Learning for Transfer Learning2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYMotivation(1/3)Example AnalysisProduct announcement:HPs just-released LaserJet Pro P1100 printer and the LaserJet Pro M1130 and M1210 multifunction printers,price performance.Announcement for Lenovo ThinkPad Thi
12、nkCentre price$150 off Lenovo K300 desktop using coupon code.Lenovo ThinkPad ThinkCentre price$200 off Lenovo IdeaPad U450p laptop using.their performanceHP newsLenovo newsProductword conceptLaserJet,printer,price,performance ThinkPad,ThinkCentre,price,performance RelatedProductannouncementdocument
13、class:12Share some common words:announcement,price,performance indicateConcept Learning for Transfer Learning2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYMotivation(2/3)Example Analysis:HPLaserJet,printer,price,performance et al.LenovoThinkpad,Thinkcentre,price,performance et al.The words expressing th
14、e same word concept are domain-dependent 13ProductProductannouncementword conceptindicatesThe association between word concepts and document classes is domain-independent Concept Learning for Transfer Learning2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYMotivation(3/3)14 Further observations:Different
15、domains may use same key words to express the same concept(denoted as identical concept)Different domains may also use different key words to express the same concept(denoted as alike concept)Different domains may also have their own distinct concepts(denoted as distinct concept)The identical and al
16、ike concepts are used as the shared concepts for knowledge transfer We try to model these three kinds of concepts simultaneously for transfer learning text classificationConcept Learning for Transfer Learning2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYPreliminary Knowledge Basic formula of matrix tri-
17、factorization:where the input X is the word-document co-occurrence matrixFGS15Concept Learning for Transfer Learning2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYPrevious method-MTrick in SDM 2019(1/2)lSketch map of MTrickSource domain Xs FsGsFtGtTargetdomain XtSKnowledge Transfer16Concept Learning for
18、Transfer Learning2022-7-28lConsidering the alike conceptsINSTITUTE OF COMPUTING TECHNOLOGYMTrick(2/2)lOptimization problem for MTrickG0 is the supervision informationthe association S is shared as bridge to transfer knowledge17Concept Learning for Transfer LearninglDual Transfer Learning(Long et al.
19、,SDM 2019),considering identical and alike concepts2022-7-28INSTITUTE OF COMPUTING TECHNOLOGYTriplex Transfer Learning(TriTL)(1/5)lFurther divide the word concepts into three kinds:18F1,identical concepts;F2,alike concepts;F3,distinct concepts Input:s source domain Xr(1rs)with label information,t ta
20、rget domain Xr(s+1rs+t)We propose Triplex Transfer Learning framework based on matrix tri-factorization(TriTL for short)2022-7-28Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGYF1,S1 and S2 are shared as the bridge for knowledge transfer across domainsThe supervision informati
21、on is integrated by Gr(1rs)in source domainsTriTL(2/5)lOptimization Problem192022-7-28Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGYTriTL(3/5)lWe develop an alternatively iterative algorithm to derive the solution and theoretically analyze its convergence 202022-7-28Concept
22、Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGYTriTL(4/5)lClassification on target domainsWhen 1rs,Gr contains the label information,so we remain it unchanged during the iterations when xi belongs to class j,then Gr(i,j)=1,else Gr(i,j)=0After the iteration,we obtain the output Gr(s+1
23、rs+t),then we can perform classification according to Gr212022-7-28Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGYTriTL(5/5)lAnalysis of Algorithm ConvergenceAccording to the methodology of convergence analysis in the two works Lee et al.,NIPS01 and Ding et al.,KDD06,the foll
24、owing theorem holds.Theorem(Convergence):After each round of calculating the iterative formulas,the objective function in the optimization problem will converge monotonically.222022-7-28Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY232022-7-28rec.autosrec.motorcyclesrec.base
25、ballrec.hockeysci.cryptsic.electronicssci.medsci.spacecomp.graphicscomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xtalk.politics.misctalk.politics.gunstalk.politics.mideasttalk.religion.miscrecscicomptalkData Preparation(1/3)l20NewsgroupsFour top categories,each top category contains four
26、 sub-categorieslSentiment Classification,four domains:books,dvd,electronics,kitchen Randomly select two domains as sources,and the rest as targets,then 6 problems can be constructedConcept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY242022-7-28rec.autosrec.motorcyclesrec.baseballr
27、ec.hockeysci.cryptsic.electronicssci.medsci.spacerec +sci -baseball crypy Source domainautos spaceTarget domainlFor the classification problem with one source domain and one target domain,we can construct 144()problems2244PPData Preparation(2/3)lConstruct classification tasks(Traditional TL)Concept
28、Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY252022-7-28lConstruct new transfer learning problems rec.autosrec.motorcyclesrec.baseballrec.hockeysci.cryptsic.electronicssci.medsci.spacerec +sci -baseball crypy autos spacecomp.graphicscomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp
29、.windows.xtalk.politics.misctalk.politics.gunstalk.politics.mideasttalk.religion.misccomptalkautos graphics14483384!More distinct concepts may exist!Data Preparation(3/3)Source domainTarget domainConcept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY262022-7-28Compared AlgorithmsCon
30、cept Learning for Transfer LearninglTraditional learning AlgorithmsSupervised Learning:Logistic Regression(LR)David et al.,00Support Vector Machine(SVM)Joachims,ICML99Semi-supervised Learning:TSVM Joachims,ICML99lTransfer learning Methods:CoCC Dai et al.,KDD07,DTL Long et al.,SDM12lClassification ac
31、curacy is used as the evaluation measure INSTITUTE OF COMPUTING TECHNOLOGY272022-7-28Experimental Results(1/3)Concept Learning for Transfer LearninglSort the problems with the accuracy of LRDegree of transfer difficultyeasierlGenerally,the lower of accuracy of LR can indicate the harder to transfer,
32、while the higher ones indicate the easier to transferharderINSTITUTE OF COMPUTING TECHNOLOGY282022-7-28Experimental Results(2/3)Concept Learning for Transfer LearninglComparisons among TriTL,DTL,MTrick,CoCC,TSVM,SVM and LR on data set rec vs.sci(144 problems)TriTL can perform well even the accuracy
33、of LR is lower than 65%INSTITUTE OF COMPUTING TECHNOLOGY292022-7-28Experimental Results(3/3)Concept Learning for Transfer LearninglResults on new transfer learning problems,we only select the problems,whose accuracies of LR are between(50%,55%(Only slightly better than random classification,thus the
34、y might be much more difficult).lWe obtain 65 problems lTriTL also outperforms all the baselinesINSTITUTE OF COMPUTING TECHNOLOGYConclusionsExplicitly define three kinds of word concepts,i.e.,identical concept,alike concept and distinct conceptPropose a general transfer learning framework based on n
35、onnegative matrix tri-factorization,which simultaneously model the three kinds of concepts(TriTL)Extensive experiments show the effectiveness of the proposed approach,especially when the distinct concepts may exist302022-7-28Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGYConc
36、ept Learning based on Probabilistic Latent Semantic Analysis for Transfer LearningConcept Learning for Transfer Learning312022-7-28INSTITUTE OF COMPUTING TECHNOLOGY322022-7-28MotivationConcept Learning for Transfer LearningProduct announcement:HPs just-released LaserJet Pro P1100 printer and the Las
37、erJet Pro M1130 and M1210 multifunction printers,price performance.Announcement for Lenovo ThinkPad ThinkCentre price$150 off Lenovo K300 desktop using coupon code.Lenovo ThinkPad ThinkCentre price$200 off Lenovo IdeaPad U450p laptop using.their performanceHP newsLenovo newsProductword conceptLaserJ
38、et,printer,price,performance ThinkPad,ThinkCentre,price,performance RelatedProductannouncementdocument class:Share some common words:announcement,price,performance indicatelRetrospect the exampleINSTITUTE OF COMPUTING TECHNOLOGY332022-7-28lSome notationsddocumentydocument classzword conceptlSome def
39、initionse.g.,p(price|Product),p(LaserJet|Product,)wwordrdomaine.g,p(Product|Product announcement)Preliminary Knowledge(1/3)Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY342022-7-28Concept Learning for Transfer LearningPreliminary Knowledge(2/3)ProductLaserJet,printer,announc
40、ement,price,ThinkPad,ThinkCentre,announcement,price Productannouncementp(w|z,r1)p(w|z,r2)p(z|y)p(w|z,r1)p(w|z,r2)E.g.,p(LaserJet|Product,HP)p(LaserJet|Product,Lenovo)p(z|y,r1)=p(z|y,r2)E.g.,p(Product|Product annoucement,HP)=p(Product|Product annoucement,Lenovo)lAlike conceptINSTITUTE OF COMPUTING TE
41、CHNOLOGY352022-7-28lDual PLSA(D-PLSA)lJoint probability over all variables p(w,d)=p(w|z)p(z|y)p(d|y)p(y)lGiven data domain X,the problem of maximum log likelihood islog p(X;)=log z p(Z,X;)includes all the parameters p(w|z),p(z|y),p(d|y),p(y).Z denotes all the latent variablesPreliminary Knowledge(3/
42、3)lThe proposed transfer learning algorithm based on D-PLSA,denoted as HIDC Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY362022-7-28lIdentical conceptp(w|za)p(za|y)lAlike conceptThe extension and intension are domain independentp(w|zb,r)p(zb|y)HIDC(1/3)The extension is doma
43、in dependent,while the intension is domain independentConcept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY372022-7-28lDistinct conceptp(w|zc,r)p(zc|y,r)lThe joint probabilities of these three graphical modelsHIDC(2/3)The extension and intension are both domain dependentConcept Lea
44、rning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY382022-7-28lGiven s+t data domains X=X1,Xs,Xs+1,Xs+t,without loss of generality,the first s domains are source domains,and the left t domains are target domainslConsider the three kinds of concepts:lThe Log likelihood function islog p(X;)=l
45、og z p(Z,X;)includes all parameters p(w|za),p(w|zb,r),p(w|zc,r),p(za|y),p(zb|y),p(zc|y,r),p(d|y,r),p(y|r),p(r).HIDC(3/3)Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY392022-7-28lUse the EM algorithm to derive the solutionslE Step:Model Solution(1/4)Concept Learning for Trans
46、fer LearningINSTITUTE OF COMPUTING TECHNOLOGY402022-7-28lM Step:Model Solution(2/4)Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY412022-7-28lSemi-supervised EM algorithm:when r is from source domains,the labeled information p(d|y,r)is known and p(y|r)can be infered p(d|y,r)=
47、1/ny,r,if d belongs y in domain r,ny,r is the number of documents in class y in domain r,else p(d|y,c)=0 p(y|r)=ny,r/nr,nr is the number of documents in domain r when r is from source domains,p(d|y,r)and p(y|r)keep unchanged during the iterations,which supervise the optimizing processModel Solution(
48、3/4)Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY422022-7-28lClassification for target domains After we obtain the final solutions of p(w|za),p(w|zb,r),p(w|zc,r),p(za|y),p(zb|y),p(zc|y,r),p(d|y,r),p(y|r),p(r)We can compute the conditional probabilities:Then the final predic
49、tion isDuring the iterations,all domains share p(w|za),p(za|y),p(zb|y),which act as the bridge for knowledge transferModel Solution(4/4)Concept Learning for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY432022-7-28BaselineslCompared AlgorithmsSupervised Learning:pLogistic Regression(LG)David et
50、al.,00pSupport Vector Machine(SVM)Joachims,ICML99Semi-supervised Learning:pTSVM Joachims,ICML99Transfer Learning:pCoCC Dai et al.,KDD07 pCD-PLSA Zhuang et al.,CIKM10 pDTL Long et al.,SDM12lOur MethodspHIDClMeasure:classification accuracyConcept Learning for Transfer LearningINSTITUTE OF COMPUTING TE