1、知识图谱架构知识图谱一般架构:来源自百度百科复旦大学知识图谱架构:早期知识图谱架构知识图谱一般架构:来源自百度百科架构讨论早期知识图谱架构知识抽取实体概念抽取实体概念映射关系抽取质量评估KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014A sampler of research problemsGrowth: knowledge graphs are incomplete!Link prediction: add relationsOntology
2、matching: connect graphsKnowledge extraction: extract new entities and relations from web/textValidation: knowledge graphs are not always correct!Entity resolution: merge duplicate entities, split wrongly merged onesError detection: remove false assertionsInterface: how to make it easier to access k
3、nowledge?Semantic parsing: interpret the meaning of queriesQuestion answering: compute answers using the knowledge graphIntelligence: can AI emerge from knowledge graphs?Automatic reasoning and planningGeneralization and abstraction9关系抽取 定义: 常见手段: 语义模式匹配频繁模式抽取,基于密度聚类,基于语义相似性 层次主题模型弱监督KDD 2014 Tutori
4、al on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Methods and techniquesSupervised modelsSemi-supervised modelsDistant supervision2. Entity resolutionSingle entity methodsRelational methods3. Link predictionRule-based methodsProbabilistic modelsFactorization methodsE
5、mbedding models80Not in this tutorial: Entity classification Group/expert detection Ontology alignment Object ranking1. Relation extraction:KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014 Extracting semantic relations between sets of grounded entiti
6、esNumerous variants:Undefined vs pre-determined set of relationsBinary vs n-ary relations, facet discoveryExtracting temporal informationSupervision: fully, un, semi, distant-supervisionCues used: only lexical vs full linguistic features82Relation ExtractionKobeBryantLA LakersplayForthe franchise pl
7、ayer ofonce again savedman of the match forthe Lakers”his team”Los Angeles”“Kobe Bryant,“Kobe“Kobe Bryant?KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Supervised relation extractionSentence-level labels of relation mentionsApple CEO Steve Jobs sai
8、d. = (SteveJobs, CEO, Apple)Steve Jobs said that Apple will. = NILTraditional relation extraction datasetsACE 2004MUC-7Biomedical datasets (e.g BioNLP clallenges)Learn classifiers from +/- examplesTypical features: context words + POS, dependency path betweenentities, named entity tags, token/parse-
9、path/entity distance83KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Semi-supervised relation extractionGeneric algorithm(遗传算法遗传算法)1.2.3.4.5.Start with seed triples / golden seed patternsExtract patterns that match seed triples/patternsTake the top-
10、k extracted patterns/triplesAdd to seed patterns/triplesGo to 2Many published approaches in this category:Dual Iterative Pattern Relation Extractor Brin, 98Snowball Agichtein & Gravano, 00TextRunner Banko et al., 07 almost unsupervisedDiffer in pattern definition and selection86founderOfKDD 2014 Tut
11、orial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Distantly-supervised relation extraction88Existing knowledge base + unlabeled text generate examplesLocate pairs of related entities in textHypothesizes that the relation is expressedGoogle CEO Larry Page announced
12、 that.Steve Jobs has been Apple for a while.Pixar lost its co-founder Steve Jobs.I went to Paris, France for the summer.GoogleCEOcapitalOfLarryPageFranceAppleCEOPixarSteveJobsDistant supervision: modeling hypothesesTypical architecture:1. Collect many pairs of entities co-occurring in sentences from
13、 text corpus2. If 2 entities participate in a relation, several hypotheses:1.All sentences mentioning them express it Mintz et al., 09“Barack Obama is the 44th and current President of the US.” (BO, employedBy, USA)89KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York,
14、August 24, 2014KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Sentence-level featuresLexical: words in between and around mentions and their parts-of-speech tags (conjunctive form)Syntactic: dependency parse path between mentions along withside node
15、sNamed Entity Tags: for the mentionsConjunctions of the above featuresDistant supervision is used on to lots of data sparsity of conjunctiveforms not an issue92Distant supervision: modeling hypothesesTypical architecture:1. Collect many pairs of entities co-occurring in sentences from text corpus2.
16、If 2 entities participate in a relation, several hypotheses:1.2.All sentences mentioning them express it Mintz et al., 09At least one sentence mentioning them express it Riedel et al., 10“Barack Obama is the 44th and current President of the US.” (BO, employedBy, USA)“Obama flew back to the US on We
17、dnesday.” (BO, employedBy, USA)95KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Distant supervision: modeling hypothesesTypical architecture:1. Collect many pairs of entities co-occurring in sentences from text corpus2. If 2 entities participate in
18、a relation, several hypotheses:1.2.3.All sentences mentioning them express it Mintz et al., 09At least one sentence mentioning them express it Riedel et al., 10At least one sentence mentioning them express it and 2 entities can expressmultiple relations Hoffmann et al., 11 Surdeanu et al., 12“Barack
19、 Obama is the 44th and current President of the US.” (BO, employedBy, USA)“Obama flew back tothe US justWednesday.” said.” employedBy, USA)98KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014was born in on he always (BO, (BO, bornIn,KDD 2014 Tutorial o
20、n Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Distant supervisionProsCan scale to the web, as no supervision requiredGeneralizes to text from different domainsGenerates a lot more supervision in one iterationConsNeeds high quality entity-matchingRelation-expression h
21、ypothesis can be wrongCan be compensated by the extraction model, redundancy, language modelDoes not generate negative examplesPartially tackled by matching unrelated entities101KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014104KobeBryantGasolteamma
22、tebornInplayInLeagueBlackMambaEntity resolutionLA LakersplayForplayForPau35ageKobe B.BryantVanessaL. BryantmarriedTo1978Single entityresolutionRelational entityresolutionEntity resolution / deduplication Multiple mentions of the same entity is wrong and confusing.KDD 2014 Tutorial on Constructing an
23、d Mining Web-scale Knowledge Graphs, New York, August 24, 2014Single-entity entity resolutionEntity resolution without using the relational context of entitiesMany distances/similarities for single-entity entity resolution:Edit distance (Levenshtein, etc.)Set similarity (TF-IDF, etc.)Alignment-based
24、Numeric distance between valuesPhonetic SimilarityEquality on a boolean predicateTranslation-basedDomain-specific105KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Relational entity resolution Simple strategies Enrich model with relational features r
25、icher context for matchingRelational features:Value of edge or neighboring attributeSet similarity measuresOverlap/JaccardAverage similarity between set membersAdamic/Adar: two entities are more similar if they share more items that areoverall less frequentSimRank: two entities are similar if they a
26、re related to similar objectsKatz score: two entities are similar if they are connected by shorter paths114KobeBryant1978teammatebornInplayForplayInLeagueBlackMambaLA LakersplayFor35agePauGasolKDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014KobeBryan
27、t1978teammatebornInplayForplayInLeagueBlackMambaLA LakersplayFor35agePauGasolRelational entity resolution Advanced strategiesDependency graph approaches Dong et al., 05Relational clustering Bhattacharya & Getoor, 07Probabilistic Relational Models Pasula et al., 03Markov Logic Networks Singla & Domin
28、gos, 06Probabilistic Soft Logic Broecheler & Getoor, 10115KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014LINK PREDICTION116KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014117KobeBryantLink prediction
29、NY KnicksPauGasolteammateplayInLeagueteamInLeagueopponentplayForLA LakersplayFor Add knowledge from existing graph No external source Reasoning within the graph1. Rule-based methods2. Probabilistic models3. Factorization models4. Embedding modelsKDD 2014 Tutorial on Constructing and Mining Web-scale
30、 Knowledge Graphs, New York, August 24, 2014First Order Inductive Learner FOIL learns function-free Horn clauses:118Gasolgiven positive negative examples of a concepta set of background-knowledge predicatesFOIL inductively generates a logical rule for the concept that cover all + and no -LALakerspla
31、yForplayForPauteammate(x,y) playFor(y,z) playFor(x,z)teammateKobeBryant Computationally expensive: huge search space large, costly Horn clauses Must add constraints high precision but low recall Inductive Logic Programming: deterministic and potentially problematicKDD 2014 Tutorial on Constructing a
32、nd Mining Web-scale Knowledge Graphs, New York, August 24, 2014S(KB, playFor,LAL)iplayForh(pai(KB,LAL)ipathsPath Ranking Algorithm Lao et al., 11119LALakersplayForPauGasolplayForteammateKobeBryantRandom walks on the graph are used to sample pathsPaths are weighted with probability of reaching target
33、 from sourcePaths are used as ranking experts in a scoring functionNYKnicksplayInLeagueteamInLeagueopponenth(Pa2(KB,LAL) = 0.2h(Pa1(KB,LAL) = 0.95KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014Link prediction with scoring functionsA scoring function
34、 alone does not grant a decisionThresholding: determine a threshold (KB, playFor, LAL) is True iff120S(KB, playFor,LAL)Ranking: The most likely relation between Kobe Bryant and LA Lakers is:rel argmaxrrelsS(KB,r,LAL) The most likely team for Kobe Bryant is:obj argmaxeentsS(KB, playFor,e)As prior for extraction models (cf. Knowledge Vault)No calibration of scores like probabilities