1、INSTITUTE OF COMPUTING TECHNOLOGYINSTITUTE OF COMPUTING TECHNOLOGYINSTITUTE OF COMPUTING TECHNOLOGY迁移学习迁移学习算法研究算法研究庄福振庄福振中国科学院计算技术研究所中国科学院计算技术研究所2019 年年 4 月月 18 日日INSTITUTE OF COMPUTING TECHNOLOGY6/7/20222TrainingDataOccPalm LinesDragonStarFortune?ProflongTgoodLawyershortFbadPhD StubrokenTgoodDoclon
2、gFbadClassifierUnseen Data(,long, T)good!What if传统监督机器学习传统监督机器学习(1/2)(1/2)from Prof. Qiang YangINSTITUTE OF COMPUTING TECHNOLOGY2022-6-73传统监督机器学习传统监督机器学习(2/2)(2/2)l传统监督学习在实际应用中在实际应用中通常不能满足!通常不能满足!训练集测试集分类器训练集测试集分类器INSTITUTE OF COMPUTING TECHNOLOGY2022-6-74迁移学习迁移学习l实际应用学习场景HP 新闻新闻Lenovo 新闻新闻迁移迁移学习学习
3、运用已有的知识对运用已有的知识对不同但相关领域不同但相关领域问题问题进行求解的一种新的机器学习方法进行求解的一种新的机器学习方法 放宽了传统机器学习的两个基本假设放宽了传统机器学习的两个基本假设INSTITUTE OF COMPUTING TECHNOLOGY2022-6-75迁移学习场景迁移学习场景(1/4)(1/4)l迁移学习场景无处不在迁移迁移知识知识迁移迁移知识知识图像分类图像分类HP 新闻新闻Lenovo 新闻新闻新闻网页分类新闻网页分类INSTITUTE OF COMPUTING TECHNOLOGY2022-6-76异构特征空间The apple is the pomaceous
4、 fruit of the apple tree, species Malus domestica in the rose family Rosaceae .Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit .Training: TextFuture: ImagesApplesBananas迁移学习场景迁移学习场景(2/4)(2/4)from Prof. Qiang YangX
5、in Jin, Fuzhen Zhuang, Sinno Jialin Pan, Changying Du, Ping Luo, Qing He: Heterogeneous Multi-task Semantic Feature Learning for Classification. CIKM 2019 : 1847-1850.INSTITUTE OF COMPUTING TECHNOLOGY2022-6-77 Test Test Training TrainingClassifierClassifier72.65%DVDElectronicsElectronics84.60%Electr
6、onicsDrop!迁移学习场景迁移学习场景(3/4)(3/4)from Prof. Qiang YangINSTITUTE OF COMPUTING TECHNOLOGY2022-6-78DVDElectronicsBookKitchenClothesVideo gameFruitHotelTeaImpractical!迁移学习场景迁移学习场景(4/4)(4/4)from Prof. Qiang YangINSTITUTE OF COMPUTING TECHNOLOGY2022-6-79OutlinepConcept Learning for Transfer Learning Concep
7、t Learning based on Non-negative Matrix Tri-factorization for Transfer Learning Concept Learning based on Probabilistic Latent Semantic Analysis for Transfer LearningpTransfer Learning using Auto-encodersTransfer Learning from Multiple Sources with Autoencoder RegularizationSupervised Representation
8、 Learning: Transfer Learning with Deep Auto-encodersINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning10Concept Learning based on Non-negative Matrix Tri-factorization for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning11I
9、ntroduction Many traditional learning techniques work well only under the assumption: Training and test data follow the same distribution Training (labeled)ClassifierTest (unlabeled)Enterprise News Classification: including the classes“Product Announcement”, “Business scandal”, “Acquisition”, Produc
10、t announcement: HPs just-released LaserJet Pro P1100 printer and the LaserJet Pro M1130 and M1210 multifunction printers, price performance .Announcement for Lenovo ThinkPad ThinkCentre price $150 off Lenovo K300 desktop using coupon code . Lenovo ThinkPad ThinkCentre price $200 off Lenovo IdeaPad U
11、450p laptop using. .their performanceHP newsLenovo newsDifferent distributionFail !INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning12Motivation (1/3) Example AnalysisProduct announcement: HPs just-released LaserJet Pro P1100 printer and the LaserJet Pro M1130 and M1210
12、 multifunction printers, price performance .Announcement for Lenovo ThinkPad ThinkCentre price $150 off Lenovo K300 desktop using coupon code . Lenovo ThinkPad ThinkCentre price $200 off Lenovo IdeaPad U450p laptop using. .their performanceHP newsLenovo newsProductword conceptLaserJet, printer, pric
13、e, performance ThinkPad, ThinkCentre, price, performance RelatedProductannouncementdocument class:Share some common words: announcement, price, performance indicateINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning13Motivation (2/3) Example Analysis:HPLaserJet, printer,
14、price, performance et al.LenovoThinkpad, Thinkcentre, price, performance et al.The words expressing the same word concept are domain-dependent ProductProductannouncementword conceptindicatesThe association between word concepts and document classes is domain-independent INSTITUTE OF COMPUTING TECHNO
15、LOGY2022-6-7Concept Learning for Transfer Learning14Motivation (3/3) Further observations:Different domains may use same key words to express the same concept (denoted as identical concept)Different domains may also use different key words to express the same concept (denoted as alike concept)Differ
16、ent domains may also have their own distinct concepts (denoted as distinct concept) The identical and alike concepts are used as the shared concepts for knowledge transfer We try to model these three kinds of concepts simultaneously for transfer learning text classificationINSTITUTE OF COMPUTING TEC
17、HNOLOGY2022-6-7Concept Learning for Transfer Learning15Preliminary Knowledge Basic formula of matrix tri-factorization: where the input X is the word-document co-occurrence matrixFGSINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning16Previous method - MTrick in SDM 2019
18、(1/2)lSketch map of MTrickSource domain Xs FsGsFtGtTargetdomain XtSKnowledge TransferlConsidering the alike conceptsINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Fuzhen Zhuang et al., SDM 201017MTrick (2/2)lOptimization problem for MTrickG0 is the supervision informationthe association S is shared as bri
19、dge to transfer knowledgelDual Transfer Learning (Long et al., SDM 2019), considering identical and alike conceptsINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning18Triplex Transfer Learning (TriTL) (1/5)lFurther divide the word concepts into three kinds:F1, identical c
20、oncepts; F2, alike concepts; F3, distinct concepts Input: s source domain Xr(1rs) with label information, t target domain Xr (s+1rs+t) We propose Triplex Transfer Learning framework based on matrix tri-factorization (TriTL for short)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Trans
21、fer Learning19F1, S1 and S2 are shared as the bridge for knowledge transfer across domainsThe supervision information is integrated by Gr (1rs) in source domainsTriTL (2/5)lOptimization ProblemINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning20TriTL (3/5)lWe develop an
22、alternatively iterative algorithm to derive the solution and theoretically analyze its convergence INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning21TriTL (4/5)lClassification on target domainsWhen 1rs, Gr contains the label information, so we remain it unchanged durin
23、g the iterations when xi belongs to class j, then Gr(i,j)=1, else Gr(i,j)=0After the iteration, we obtain the output Gr (s+1rs+t), then we can perform classification according to GrINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning22TriTL (5/5)lAnalysis of Algorithm Conv
24、ergenceAccording to the methodology of convergence analysis in the two works Lee et al., NIPS01 and Ding et al., KDD06, the following theorem holds.Theorem (Convergence): After each round of calculating the iterative formulas, the objective function in the optimization problem will converge monotoni
25、cally.INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning23rec.autosrec.motorcyclesrec.baseballrec.hockeysci.cryptsic.electronicssci.medsci.spacecomp.graphicscomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xtalk.politics.misctalk.politics.gunstalk.politics.midea
26、sttalk.religion.miscrecscicomptalkData Preparation (1/3)l20NewsgroupsFour top categories, each top category contains four sub-categorieslSentiment Classification, four domains: books, dvd, electronics, kitchen Randomly select two domains as sources, and the rest as targets, then 6 problems can be co
27、nstructedINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning24rec.autosrec.motorcyclesrec.baseballrec.hockeysci.cryptsic.electronicssci.medsci.spacerec +sci -baseball crypy Source domainautos spaceTarget domainlFor the classification problem with one source domain and one
28、 target domain, we can construct 144 ( ) problems2244PPData Preparation (2/3)lConstruct classification tasks (Traditional TL)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning25lConstruct new transfer learning problems rec.autosrec.motorcyclesrec.baseballrec.hockeysci.cr
29、yptsic.electronicssci.medsci.spacerec +sci -baseball crypy autos spacecomp.graphicscomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xtalk.politics.misctalk.politics.gunstalk.politics.mideasttalk.religion.misccomptalkautos graphics14483384 !More distinct concepts may exist!Data Preparation (
30、3/3)Source domainTarget domainINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning26Compared AlgorithmslTraditional learning AlgorithmsSupervised Learning: Logistic Regression (LR) David et al., 00Support Vector Machine (SVM) Joachims, ICML99Semi-supervised Learning: TSVM
31、Joachims, ICML99lTransfer learning Methods: CoCC Dai et al., KDD07, DTL Long et al., SDM12lClassification accuracy is used as the evaluation measure INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning27Experimental Results (1/3)lSort the problems with the accuracy of LRDe
32、gree of transfer difficultyeasierlGenerally, the lower of accuracy of LR can indicate the harder to transfer, while the higher ones indicate the easier to transferharderINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning28Experimental Results (2/3)lComparisons among TriTL
33、, DTL, MTrick, CoCC, TSVM, SVM and LR on data set rec vs. sci (144 problems)TriTL can perform well even the accuracy of LR is lower than 65%INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning29Experimental Results (3/3)lResults on new transfer learning problems, we only s
34、elect the problems, whose accuracies of LR are between (50%, 55% (Only slightly better than random classification, thus they might be much more difficult).lWe obtain 65 problems lTriTL also outperforms all the baselinesINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning30
35、ConclusionsExplicitly define three kinds of word concepts, i.e., identical concept, alike concept and distinct conceptPropose a general transfer learning framework based on nonnegative matrix tri-factorization, which simultaneously model the three kinds of concepts (TriTL) Extensive experiments show
36、 the effectiveness of the proposed approach, especially when the distinct concepts may existINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning31Concept Learning based on Probabilistic Latent Semantic Analysis for Transfer LearningINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7
37、Concept Learning for Transfer Learning32MotivationProduct announcement: HPs just-released LaserJet Pro P1100 printer and the LaserJet Pro M1130 and M1210 multifunction printers, price performance .Announcement for Lenovo ThinkPad ThinkCentre price $150 off Lenovo K300 desktop using coupon code . Len
38、ovo ThinkPad ThinkCentre price $200 off Lenovo IdeaPad U450p laptop using. .their performanceHP newsLenovo newsProductword conceptLaserJet, printer, price, performance ThinkPad, ThinkCentre, price, performance RelatedProductannouncementdocument class:Share some common words: announcement, price, per
39、formance indicatelRetrospect the exampleINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning33lSome notationsddocumentydocument classzword conceptlSome definitionse.g., p(price|Product), p(LaserJet|Product,)wwordrdomaine.g, p(Product|Product announcement)Preliminary Knowle
40、dge (1/3)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning34Preliminary Knowledge (2/3)ProductLaserJet, printer, announcement, price, ThinkPad, ThinkCentre, announcement, price Productannouncementp(w|z,r1)p(w|z,r2)p(z|y) p(w|z,r1) p(w|z,r2) E.g., p(LaserJet|Product, HP)
41、 p(LaserJet|Product, Lenovo) p(z|y,r1) = p(z|y,r2)E.g., p(Product|Product annoucement, HP) = p(Product|Product annoucement, Lenovo)lAlike conceptINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning35lDual PLSA (D-PLSA)lJoint probability over all variables p(w,d) = p(w|z) p
42、(z|y) p(d|y) p(y)lGiven data domain X, the problem of maximum log likelihood islog p(X;) = log z p(Z,X;) includes all the parameters p(w|z), p(z|y), p(d|y), p(y). Z denotes all the latent variablesPreliminary Knowledge (3/3)lThe proposed transfer learning algorithm based on D-PLSA, denoted as HIDC I
43、NSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning36lIdentical conceptp(w|za)p(za|y)lAlike conceptThe extension and intension are domain independentp(w|zb,r)p(zb|y)HIDC (1/3)The extension is domain dependent, while the intension is domain independentINSTITUTE OF COMPUTING
44、 TECHNOLOGY2022-6-7Concept Learning for Transfer Learning37lDistinct conceptp(w|zc,r)p(zc|y,r)lThe joint probabilities of these three graphical modelsHIDC (2/3)The extension and intension are both domain dependentINSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning38lGiven
45、 s+t data domains X = X1, Xs, Xs+1, Xs+t, without loss of generality, the first s domains are source domains, and the left t domains are target domainslConsider the three kinds of concepts:lThe Log likelihood function islog p(X;) = log z p(Z,X;) includes all parameters p(w|za), p(w|zb,r), p(w|zc,r),
46、 p(za|y), p(zb|y), p(zc|y,r), p(d|y,r), p(y|r), p(r).HIDC (3/3)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning39lUse the EM algorithm to derive the solutionslE Step:Model Solution (1/4)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning40lM
47、 Step:Model Solution (2/4)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning41lSemi-supervised EM algorithm: when r is from source domains, the labeled information p(d|y,r) is known and p(y|r) can be infered p(d|y,r) = 1/ny,r, if d belongs y in domain r, ny,r is the numb
48、er of documents in class y in domain r, else p(d|y,c) = 0 p(y|r) = ny,r / nr , nr is the number of documents in domain r when r is from source domains, p(d|y,r) and p(y|r) keep unchanged during the iterations, which supervise the optimizing processModel Solution (3/4)INSTITUTE OF COMPUTING TECHNOLOG
49、Y2022-6-7Concept Learning for Transfer Learning42lClassification for target domains After we obtain the final solutions of p(w|za), p(w|zb,r), p(w|zc,r), p(za|y), p(zb|y), p(zc|y,r), p(d|y,r), p(y|r), p(r) We can compute the conditional probabilities: Then the final prediction isDuring the iteration
50、s, all domains share p(w|za), p(za|y), p(zb|y), which act as the bridge for knowledge transferModel Solution (4/4)INSTITUTE OF COMPUTING TECHNOLOGY2022-6-7Concept Learning for Transfer Learning43BaselineslCompared AlgorithmsSupervised Learning: pLogistic Regression (LG) David et al., 00pSupport Vect
侵权处理QQ:3464097650--上传资料QQ:3464097650
【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。