应用社会网路分析於学术社群推荐课件.ppt_163文库

资源描述

1、1大綱緒論相關文獻研究方法系統發展與實證分析結論與建議2緒論3研究背景與動機資訊過載(Information Overload)搜尋引擎與推薦系統的出現，成為改善資訊過載問題的兩大利器使用者除本身的主觀喜好之外，其為容受到人際關係的影響虛擬社群與社會網，成為許多使用者獲得資訊情報的最佳來源本研究探討如何運用社會網路提升資訊推薦的品質4研究目的本研究希望透過主題概念萃取與社會網路分析，建構資訊推薦系統，藉此達到以下的目標：主題概念萃取：萃取出文件中的重要關鍵字用關鍵字分群的方式，達到主題概念萃取的目的，藉以瞭解使用者所關注的興趣與議題形成主題社群以向空間模型表示使用者的個別興趣，並結合使用者社會

2、網路，將相似高且具有相同主題興趣的使用者群聚在一起，以形成主題社群資訊推薦經由主題社群的產生，針對使用者個人的主題偏好，進行個人化推薦5相關研究6社會網路分析社會網路分析(Social Network Analysis)是一種研究社會結構、組織系統、人際關係、團體互動的概與方法，是在社會計學基礎上所發展出的分析方法社會網路分析研究域中，最著名的之一為度分隔理論40最初利用信件傳遞實驗，發現從寄件者到收件者之間，平均轉寄次指互不相干的個人，最多可經由五個中介者結出某種關係7社會網路示意圖http:/en.wikipedia.org/wiki/Social_network 8社會網路分析(Cont

3、.)在社會網路分析中，個別行動者的量測指標主要有以下三項21：Degreenumber of direct connectionsBetweennessrole of broker or gatekeeperCloseness Centralitywho has the shortest path to all others1():the edge for vertex to vertex hijjijd immij1,():the number of geodesics between and that contain :the number of geodesics between and

4、 njikjkijkjikjkgb iggjkigjk11():the shortest path of to Njijijc iddij9Clustering AlgorithmPartitioning methodsk-Means Hierarchical methodsAgglomerative Divisive Model-based methodsSelf-Organizing Map 10Clustering Algorithm(續)Partitioning methodsk-Means Hierarchical methodsAgglomerative Divisive Mode

5、l-based methodsSelf-Organizing Map11Clustering Algorithm(續)Partitioning methodsk-MeansHierarchical methodsAgglomerativeDivisiveModel-based methodsSelf-Organizing Map12推薦系統推薦系統的目的是從大量資訊中找出使用者最可能感興趣的部份，減少使用者主動搜尋的機會成本目前常應用在推薦系統的方法主要有兩種內容導向(Content-based)式推薦協同過濾(Collaborative Filtering)式推薦13資訊檢索向量空間模型Th

6、e vector model ranks the documents according to their degree of similarity to the query,and retrieve the documents with a degree of similarity above a thresholdDefineWeight wi,j associated with a pair(ki,dj)is positive and non-binary (t is the total number of index terms)The index terms in the query

7、 are also weightedwi,q is the weight associated with the pair ki,q,where wi,q=0 (t is the total number of index terms)Degree of similarity of dj with regard to q:The cosine of the angle between the two corresponding vectors).,(,2,1jtjjjwwwd).,(,2,1qtqqwwwq tiqitijitiqijiwwwwqq1,21,21,jjjddq),sim(d14

8、資訊檢索向量空間模型圖示NormalizedTerm-document matrix15資訊檢索向量空間模型圖示16研究方法17語料庫本研究以交通大學機構典藏系統 38所收集的期刊文做為語庫選取標題(Title)、摘要(Abstract)、關鍵字(Keyword)及作者(Author)欄位做為資源http:/ir.lib.nctu.edu.tw 系統雛型展示18前置處理斷詞字(Tokenization)與小寫化(Lowercasing)刪除停用字(Stopword Removing)詞性標記(Part-of-speech)片語化(Chunking)詞幹還原(Stemming)特徵選擇(Feat

9、ure Selection)19Some combinatorial characteristics of matrix multiplication on regular two-dimensional arrays are studied.From the studies,the authors are able to design many efficient varieties of the cylindrical array and the two-layered mesh array for matrix multiplication.some combinatorial char

10、acteristics of matrix multiplication on regular two-dimensional arrays are studied from the studies the authors are able to design many efficient varieties of the cylindrical array and the two-layered mesh array for matrix multiplicationsome combinatorial characteristics of matrix multiplication on

11、regular two-dimensional arrays are studied from the studies the authors are able to design many efficient varieties of the cylindrical array and the two-layered mesh array for matrix multiplicationcombinatorial characteristics matrix multiplication regular two-dimensional arrays studied studies auth

12、ors design efficient varieties cylindrical array two-layered mesh array matrix multiplicationcombinatorial_jj characteristics_nns matrix_nn multiplication_nn regular_jj two-dimensional_jj arrays_nns studied_vbn studies_nns authors_nns design_vb efficient_jj varieties_nns cylindrical_jj array_nn two-

13、layered_jj mesh_nn array_nn matrix_nn multiplication_nnPOSPhrasenounnounnounverbnounnounverbnounnounnounnouncombinatorial characteristicsmatrix multiplicationregular two-dimensional arraysstudiedstudiesauthordesignefficient varieties cylindrical arraytwo-layered mesh arraymatrix multiplicationPOSPhr

14、asenounnounnounverbnounnounverbnounnounnounnouncombinatori characteristmatrix multiplregular two-dimension arraistudistudiauthordesigneffici varieticylindr arraitwo-lay mesh arraimatrix multipl前置處理(續)20主題關鍵字分群使用者模型計算語意相關度建立語意網路圖關鍵字分群關鍵字分群標記21使用者模型採用TF-IAF(Term Frequency-Inverse Author Frequency)30來衡

15、量使用者與關鍵字間的關聯計算完TF-IAF後，每個使用者皆可以向量的形式來呈現12(,.,),if term is associated with author ,otherwise:#of keywordsjjjjmijiijUwwwtfiafijwm 2frequency of termassociated with author of authorslog:of author that use the term ijiiiiijijitffreqijNNiafninwtfiaf 22計算語意相關度本研究以子為範圍，即個關鍵字在同一子內出現才表示其具有語意相關。透過增加標題(Title)及

16、關鍵字(Keyword)權重強化這些關鍵字關係之代表性1,if&arebothin title or keyword(),otherwisemax(),()ijijijijttf ttrf tf t 23建立語意網圖每個關鍵字可表示為一個點，點權重為個別關鍵字在使用者間TF-IAF的加總，再加上該關鍵字所有語意相關度平均關鍵字間的關係表示成一個邊，邊權重即為關鍵字的語意相關度運用9的方法進行主題關鍵字分群11 o f au th o rs:th e d eg ree o f vertex NiijjhijjiiiwwrC WwhNhv24建立語意網路圖25主題關鍵字分群示意圖926選取重要候選

17、關鍵字Finding vertices whose weights are larger than the average weight27主題關鍵字分群(Cont.)k-Nearest Neighbor Approach19考慮圖中的每個點，取與該點最相近的k個點為一組，每組為一個通圖，稱之為候選關鍵字組產生候選關鍵字子群以每個候選關鍵字組為中心，向外還原先前與候選關鍵字組內的點有直接線關係的邊，形成候選關鍵字子群，並計算每個子群的權重，如方程式(3-6)所示。kijkGijrGWr(3-6)28關鍵字分群Use k-nearest neighbor graph approach29主題關鍵

18、字分群(Cont.)合併候選關鍵字子群找出互性(Inter-connectivity)最強的個子群將之合併，直到子群間的互相關(Relative Inter-connectivity)小於門檻值後停止。互相關度方程式(3-7)所示。(,)|(,)|ijijE G GijGGWRI G GWW(3-7)30合併候選關鍵字子群,(,)ijiE G GijGGjWRI G GWW31主題關鍵字分群(Cont.)修正並產生主題關鍵字分群讓每個子群內的關鍵字個保持在一定的差距內子群內包含的關鍵字比平均個數少，但子群權重卻大於平均權重時，將該群保子群經修正後仍小於平均權重，將該群直接刪除子群權重如方程式(

19、3-8)所示|()|()(|()|()1|)/2()|()|()()ijijrGE GCD GV GV GrAS GE GCWCD GAS G(3-8)32修正並產生主題關鍵字分群 CWCD GAS G 12E GCD GV GV G ijijrE GrAS GE G33關鍵字分群標記利用人過出有意義的關鍵字取權重最高的關鍵字做為最後群的標記34建立主題社群使用者社會網路使用者分群35使用者社會網路1,if user is one of the authors in document 0,otherwiseijijW 0 0 10 1 0 12 2 01 1 00 1 1 12 3 10 1

20、11 0 1 00 1 21 1 0TSWW:number of co-author publications between author and author ijSij36使用者社會網路(續)(,)|ijijijijSJ UUWWS10,*,1RijotherwiseJjiifij37使用者分群將所有使用者向量模型以Nm的矩陣U表示，N代表使用者數目，m代表所有關鍵字數目以矩陣R代表使用者間相關係數，乘上以使用者向量模型構成的矩陣U，形成一新的矩陣U代表更新後的使用者向量模型(參數調整R的影響程度)111212122212 .U=.mmNNNmwwwwwwwww1112111121111

21、212122221222212221212 .U=.NmmNmmNNNNNNNmRRRwwwwwwRRRwwwwwwRRRwww 12 .NNNmwww38使用者分群(續)以餘弦相似度(Cosine Similarity)計算使用者與個別主題的相似度，當使用者與主題間的相似度大於門檻值時，則將其歸類到該主題1212(,.,)where 1,2,.,:the weight of keywordassociated with user :#of users:#of keywords(,.,)where 1,2,.,if ke jjjjmijkkkkmkmUwwwjNwijNmCtttkpt 12y

22、word,otherwise:#of clusters,.,1,2,.,;1,2,.,kjkjjjkiCpSUsim UCsim UCsim UCwhere jNkp 39推薦模式在社群中的成員都具有相似的主題興趣，但是由於多重主題9的屬性存在，使得使用者可能對多種主題都具有偏好，於是產生個人化推薦與社群推薦兩種推薦模式，茲分述如下：個人化推薦(Collaborative Filtering)依據內容導向方法，對使用者進行論文推薦，即計算社群內成員所撰寫的論文與個別成員的相似度，選取相似度最高的n篇論文給予推薦社群推薦(擴展閱讀層面)透過分析社群成員對其他主題的興趣分佈，統計出具有較高偏好比重

23、的主題，推薦項目以與該主題最相關的n篇論文40系統發展與實證分析41系統發展系統架構42系統發展系統介面43系統發展系統介面(續)44系統發展系統介面(續)45系統發展系統介面(Cont.)46系統發展系統介面(續)47實驗結果分群結果評估首先將系統分群的結果分類，即將相近的群歸屬於同一類依序對個別使用者進行分類之動作採用準確(Precision)與回現(Recall)兩項指標15，來評估分群結果的好壞Relevant RetrievedPrecision=Retrieved authorsRelevant RetrievedRecall=Relevant authors48Class lab

24、elCluster labelNetwork CommunicationMobile ComputingRouting ProtocolPIM-SMBandwidth RequestsTCPNetwork ManagementArtificial IntelligenceGenetic AlgorithmNetwork MotifBrick Motif Content AnalysisNeural NetworkSPDNNDivide-and-conquer LearningComputer GraphicsContent-based Image RetrievalWatershed Segm

25、entationToboggan ApproachInformation RetrievalSemantic QueryContent ManagementComputer SystemMemory CacheParallel Algorithm Information SecurityEnd-to-end SecurityGraph TheoryInterconnection NetworkSoftware EngineeringReliability Analysis實驗結果分群結果評估(續)49實驗結果分群結果評估(續)Class label#of authorsNetwork Comm

26、unication111Artificial Intelligence28Information Retrieval7Computer System6Computer Graphics23Information Security10Graph Theory29Software Engineering4Others17Total2355000.10.20.30.40.50.60.70.800.10.20.30.40.50.60.70.80.91 valuePrecisionRecall實驗結果分群結果評估(續)value00.10.20.30.40.50.60.70.80.91Precision

27、0.7071 0.6917 0.6981 0.7107 0.7172 0.7209 0.7209 0.7209 0.7209 0.7209 0.7209 Recall0.6271 0.7606 0.7785 0.7839 0.7817 0.7828 0.7828 0.7828 0.7828 0.7828 0.7828 51實驗結果推薦結果評估標凖差為0.068，當信賴水凖達95%時，信賴區間為(0.632,0.897)；Kappa值為0.764，專家同意度為0.95針對專家具有相同意見之推薦結果，總共有208筆，認為符合使用者需求之推薦有187筆，則推薦之準確率為187/208=0.899Ex

28、pert ANoYesTotalExpert BNo21(9.6%)9(4.1%)30(13.7%)Yes2(0.9%)187(85.4%)189(86.3%)Total23(10.5%)196(89.5%)219Kappa=(Observed agreement-Chance agreement)/(1-Chance agreement)Observed agreement=(21+187)/219=0.949Chance agreement=0.1050.137+0.8950.863=0.786Kappa=(0.949-0.786)/(1-0.786)=0.76452作者收錄論文數量分析

29、論文收錄的篇數介於1篇到41篇，只收錄1篇文章的作者有129位，佔全部作者的55%；收錄少於5篇的作者有93%12944299642211111111110204060801001201401234567810111314172632333641Number of authorsNumber of publications53作者收錄論文數量分析(續)NamePublicationsYu-Chee Tseng(曾煜棋)Jimmy J.M.Tan(譚建民)Lih-Hsing Hsu(徐力行)Yi-Bing Lin(林一平)Ying-Dar Lin(林盈達)Ling-Hwei Chen(陳玲慧)C

30、huen-Tsai Sun(孫春在)Jang-Ping Sheu(許健平)Hsin-Chia Fu(傅心家)Hao-Ren Ke(柯皓仁)Wei-Pang Yang(楊維邦)Wen-Guey Tzeng(曾文貴)Chien-Chao Tseng(曾建超)Tseng-Kuei Li(李增奎)Wen-Chih Peng(彭文志)Chang-Hsiung Tsai(蔡正雄)Deng-Jyi Chen(陳登吉)Yuan-Cheng Lai(賴源正)413633322617141311108877666654共同作者分析共同作者數介於1到6位作者之間，只有單一作者的論文有6篇，佔全部論文數的3%；共同

31、作者為2到6位間的論文篇數共有220篇，佔全部的97%6807756610102030405060708090123456Number of publicationsNumber of co-authors55社會網路Yu-Yu-CheeChee TsengTseng56社會網路量測指標分析RankDegreeBetweennessCloseness1234567891011121314151617181920Yu-Chee TsengYi-Bing LinYing-Dar LinJimmy J.M.TanLih-Hsing HsuHsin-Chia FuJang-Ping SheuChien

32、-Chao TsengChuen-Tsai SunHao-Ren KeLing-Hwei ChenWei-Pang YangHsiao-Tien PaoZen-Chung ShihChang-Hsiung TsaiJeu-Yih JengYeong-Yuh XuDeng-Jyi ChenWen-Guey TzengMing-Hour Yang4332292926161514121110887777777Yu-Chee TsengChien-Chao TsengYi-Bing LinMing-Feng ChangYing-Dar LinWen-Chih PengJimmy J.M.TanLih-

33、Hsing HsuChuen-Tsai SunHsin-Chia FuLing-Hwei ChenJang-Ping SheuSunny S.J.LinHao-Ren KeChi-Fu HuangWen-Guey TzengShi-Chun TsaiDeng-Jyi ChenZen-Chung ShihWei-Pang Yang2660.3332180.5002081.3331792.000376.500340.000213.167133.16791.00086.00044.00038.33336.00032.83322.50021.33315.33312.00012.0008.833Yu-C

34、hee TsengChien-Chao TsengMing-Feng ChangChi-Fu HuangHsiao-Lu WuYuan-Ying HsuJung-Hsuan FanYi-Bing LinHang-Wen HwangJang-Ping SheuWen-Chih PengMeng-Ta HsuLin-Yi WuMing-Hour YangChih-Yu LinSze-Yao NiWen-Hwa LiaoShih-Lin WuChih-Shun HsuChi-He Chang0.6780.6780.6780.6770.6770.6770.6770.6770.6770.6770.677

35、0.6770.6770.6770.6770.6770.6770.6770.6770.67757結論與建議58結論本研究致於改善資訊推薦的效能，主要的目在於提出結合主題概念萃取與社會網路分析之資訊推薦系統，以提供符合使用者需求之推薦資訊。經由實驗與統計分析的驗證，將本研究的結果整理如下：主題概念萃取：所有235位作者，226篇論文中，共產生22個主題概念形成主題社群：經由實驗發現，社會網路對提升使用者分群之回現有較佳之效果，代表其能發掘出更多具有關聯性之使用者資訊推薦：資訊推薦之準確率為0.899，顯見系統之推薦效果，頗能符合使用者需求59後續建議建主題本體論進主題萃取的過程中，利用階層式分群法

36、以樹結構表示主題分群之結果，產生主題概階層經由使用者主題偏好之關聯，建立主題概念之連結，以形成主題本體論幫助使用者瞭解本身處於何種階層層級，未來可朝哪些研究方向前進使用者評分之應用使用者評分可分為明顯性評分與隱含性評分。明顯性評分為使用者依對目標物感興趣程度給予主觀評分；隱含性評分的估計通常以使用者的瀏覽行為做依據經由使用者評分可以更精確瞭解使用者偏好所在，使資訊推薦更符合使用者需求60後續建議(續)社會網路之階層擴展可經由建立在共同社會網路中之使用者關係，進一步探討社會網路之資訊流動及影響。例如使用Floyd-Warshall演算法可找出位於同一社會網路中，兩兩使用者間的最短路徑，則可經由節

37、點的分析，研究其對使用者的影響。機構典藏與資料庫系統之加值應用以其他方式(如引用、共引)架構社會網路61參考文獻(續)1.A.Iskold,(2007)“The Art,Science and Business of Recommendation Engines.”http:/ clustering:A review,”ACM Computing Surveys,vol.31,pp.264-323,1999.3.B.Krulwich,&C.Burkey,“The InfoFinder agent:Learning user interests through heuristic phrase

38、extraction,”IEEE Expert:Intelligent Systems and Their Applications,vol.12,pp.22-27,1997.4.B.Sarwar,G.Karypis,J.Konstan,&J.Riedl,“Analysis of recommendation algorithms for e-commerce,”Proceedings of the 2nd ACM conference on Electronic commerce,pp.158-167,2000.5.D.Goldberg,D.Nichols,B.M.Oki,&D.Terry,

39、“Using Collaborative Filtering to Weave An Information Tapestry,”Communications of the ACM,vol.35,pp.61-70,1992.6.D.Koller,&M.Sahami,“Hierarchically classifying documents using very few words,”Proceedings of 14th the International Conference on Machine Learning,pp.170178,1997.7.F.Sebastiani,“Machine

40、 learning in automated text categorization,”ACM Computing Surveys,vol.34,pp.1-47,2002.8.G.Karypis,E.H.Han,&V.Kumar,“Chameleon:Hierarchical clustering using dynamic modeling,”Computer,vol.32,pp.68-75,1999.9.H.C.Chang,&C.C.Hsu,“Using topic keyword clusters for automatic document clustering,”Transactio

41、ns on Information and Systems,vol.88,pp.1852-1860,2005.10.H.Hotta,“User profiling system using social networks for recommendation”,In Proceedings of 8th International Symposium on Advanced Intelligent Systems,2007.11.H.Kautz,B.Selman,&F.Park,“Referral Web:Combining social networks and collaborative

42、filtering,”Communications of the ACM,vol.40,pp.63-65,1997.62參考文獻(續)12.H.Sakagami,&T.Kamba,“Learning Personal Preferences on Online Newspaper Articles from User Behaviors,”Computer Networks and ISDN Systems,vol.29,pp.1447-1455,1997.13.J.B.Schafer,J.Konstan,&J.Riedi,“Recommender systems in e-commerce,

43、”Proceedings of the 1st ACM conference on Electronic commerce,pp.158-166,1999.14.J.MacQueen,“Some methods for classification and analysis of multivariate observations,”Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,vol.1,pp.281-297,1967.15.J.Makhoul,F.Kubala,R.Schwa

44、rtz,&R.Weischedel,“Performance measures for information extraction,”Proceedings of DARPA Broadcast News Workshop,pp.249-252,1999.16.J.Moreno,Who Shall Survive?New York:National Institute of Mental Health,1934.17.J.R.Tyler,D.M.Wilkinson,&B.A.Huberman,“Email as spectroscopy:Automated discovery of comm

45、unity structure within organizations,”Communities and technologies,pp.81-96,2003.18.J.Rucker,&M.J.Polanco,“Siteseer:Personalized navigation for the web,”Communications of the ACM,vol.40,pp.73-76,1997.19.K.C.Gowda,&G.Krishna,“Agglomerative clustering using the concept of mutual nearest neighbourhood,

46、”Pattern Recognition,vol.10,pp.105-112,1978.20.K.Faust,“Comparison of methods for positional analysis:Structural and general equivalences,”Social Networks,vol.10,pp.313-341,1988.21.L.C.Freeman,“Centrality in Social Networks:Conceptual clarification,”Social Networks,vol.1,pp.215-239,1979.22.L.Page,&S

47、.Brin,“The anatomy of a large-scale hypertextual Web search engine,”In Proceedings of the seventh international World-Wide Web conference,1998.63參考文獻(續)23.L.Garton,C.Haythornthwaite,&B.Wellman,(1997)“Studying Online Social Networks,”http:/jcmc.huji.ac.il/vol3/issue1/garton.html24.M.A.Shah,“ReferralW

48、eb:A resource location system guided by personal relations,”Masters thesis,M.I.T.,1997.25.M.Granovetter,“The strength of weak ties:A network theory revisited,”Sociology Theory,vol.1,pp.201-233,1983.26.N.Zhong,J.Liu&Y.Yao,“In search of the wisdom web,”Computer,vol.35,pp.27-31,2002.27.P.Athanasios,Pro

49、bability,Random Variables and Stochastic Processes.,Second Edition ed.New York:McGraw-Hill,1984.28.P.Mika,“Flink:Semantic Web technology for the extraction and analysis of social networks,”Web Semantics:Science,Services and Agents on the World Wide Web,vol 3,pp.211-223,2005.29.P.Pattison,Algebraic m

50、odels for social networks.,Cambridge University Press,1993.30.S.E.Chan,R.K.Pon,&A.F.Crdenas,“Visualization and Clustering of Author Social Networks,”International Conference on Distributed Multimedia Systems Workshop on Visual Languages and Computing,pp.30-31,2006.31.S.P.Borgatti,(1998)“What Is Soci

展开阅读全文