1、医学知识智能图谱的构造与应用2018/7/242人工智能的两个技术支柱 深度学习(Deep Learning)语义技术(Semantic Technology)与知识图谱(Knowledge Graph)2018/7/243人工智能领域标志性事件 20112011年年 IBMIBM沃森(人工智能程序)参加综艺节沃森(人工智能程序)参加综艺节目目危险边缘,第一次人与机危险边缘,第一次人与机 器对决,沃森打败了最高奖金得主布拉器对决,沃森打败了最高奖金得主布拉德德鲁特尔和连胜纪录保持者肯鲁特尔和连胜纪录保持者肯詹宁詹宁 斯。沃森赢得了第一笔奖斯。沃森赢得了第一笔奖金金100100万美元。(万美元。
2、(语义技术与知识图语义技术与知识图谱谱)20112011年年 PalantirPalantir公司采用大数据技术发现公司采用大数据技术发现本本拉登的藏身拉登的藏身地地 (数据挖掘,语数据挖掘,语 义技术与知识图谱)义技术与知识图谱)20122012年伦敦奥林匹克运动会自动报导体育新闻年伦敦奥林匹克运动会自动报导体育新闻(语义技术与知识图谱语义技术与知识图谱)20162016年年 AlphaGoAlphaGo围棋对弈战胜李世石围棋对弈战胜李世石(深度学习技深度学习技术术)Gotham:intelligence data analytics2018/7/244知识图谱Knowledge Graph
3、s Knowledge Graph is a large scale semantic network consisting of entities and concepts as well as the semantic relationships among them,using Web-oriented knowledge representation languages such as RDF/RDF Schema/OWL.知识图谱是描述概念和实体及其语义关系所构成的大型知识库。知 识图谱通常是使用网络知识表达语言如RDF/RDFS/OWL来描述的。5China China 20116
4、62012语义数据云图语义数据云图(Linked Open Data)China China 2011772012关联语义数据云关联语义数据云图图 :其中每个结点表示一个开放的数据源,结点之间的弧表示其中每个结点表示一个开放的数据源,结点之间的弧表示 数据源之间相互链接数据源之间相互链接 295个数据集个数据集,310亿条亿条RDF语句语句,5.04亿个亿个RDF链接链接(2011年年9月)月)领域涵盖:地理信息、生命科学、百科词条领域涵盖:地理信息、生命科学、百科词条、媒体、出版、政府信息、计算媒体、出版、政府信息、计算 机与通讯技术、工程领域机与通讯技术、工程领域、社会科学等,几乎无所不包
5、社会科学等,几乎无所不包关联数据云图关联数据云图(Linked Open Data)China 20118Cloud Linked Open Data(LOD)China 20128China 20139From http:/lod- datasets,in 900.000 documentsdescribing 8 billion entitiesMain Features of Knowledge Graphs知识图谱的主要特征 Standardization(标准化):RDF/RDFS/OWL Share-ability(可共享性)/Reusability(可重用性):Linked Op
6、en Data(LOD)Reasonability(可推理性):OWL-DL,a decidable fragment of the first-order logic Openness(开放性):Open World Assumption,Non-unique Name Assumption Easy Maintenance(易维护性):Declarative Knowledge Representation (compared with Procedural Specification in Programming codes)12Use Cases of Knowledge Graphs
7、知识图谱的应用场景 Data Integration/interoperability(数据整合/数据互操作)Vertical Search(垂直搜索)Semantic Search(语义搜索)Knowledge Representation(知识表达)Knowledge Management(知识管理)Decision Support(决策支持)13知识图谱的构造流程14识别Identification转换Transformation清洗Cleaning整合IntegrationReasoning应用Application 工作流Workflow存储Storage/索引Indexing查询Q
8、uerying/推Q u理erRyeinagsoning识别IdentificationIdentificationIdentification of required data/knowledge resourcesRequirement/System DesignSelected Data/Knowledge Resources15转换TransformationStructured data(Databases/Spreadsheets)Semi-structured data (XML/Deep Web Data)Free Text (Webpage/Text/pdf data,etc
9、.)TransformationRDF TriplesUsing database/spreadsheet tool to create XML dataUsing XSLT tools to convert XML data into RDF TripleUsing NLP/Machines learning/Deep learning tools for free text processingKnowledge/relation extraction from documentsIdentification and disambiguation of entity/concept/rel
10、ation-LSI(Latent Semantic Indexing,Deerwester 1990)-pLSI(probalitistic LSI,Hofmann 1999)-LDA(Latent Dirichlet Allocation,Blei 2003),-Neural networks Models(Word2vec,Mikolov 2013)16Sometimes manual work on T-box清洗CleaningCleaningRDF Editor(Hyena)Validator(Vapour)Knowledge Graph Cleaner(LOD Laundromat
11、,Beek 2014)RDF TriplesRDF Triples17整合IntegrationCleaningEntity/Concept MappingDatabase schema mappingOntology alignmentVisualization Tools-ITM Align:semi-automated ontology alignment-Optima:A Visual Ontology Alignment Tool-CogZ:Cognitive Support and Visualization for Human-Guided Mapping Systems-Agr
12、eementMaker:Efficient Matching for Large Real-World Schemas and Ontologies-Biomixer:A Web-based Collaborative Ontology Visualization Tool.-SDI(Semantic Data Integration)Tool:A Semantic Mapping Representation and Generation Tool Using UML for System EngineersDisambiguation of entity/concept/relation(
13、particular for those data are converted by using non-free-tree tools)RDF TriplesRDF Triples18存储Storage/索引IndexingStorage/IndexingTriple store(alternatively called RDF store):a purpose-built database for the storage and retrieval of triples through semantic queries.Indexed RDF TriplesRDF TriplesOntoT
14、ext,a partner in LarKCOrri Erling,Virtuoso developer,LarKC ReviewerIan Horrocks,LarKC Reviewer-AllegroGraph-Virtuoso-OWLLink-Bigdata(R)-LarKCGraph Databases:a more generalized structure than a triple store,using graph structures with nodes,edges,and properties to represent and store data:-Neo4j-Grap
15、hDB-VirtuosoZhisheng Huang,LarKC WP leaderFrank van Harmelen,LarKC Scientific Director19查询Querying and 推理ReasoningQuerying/ReasoningQuery Language:SPARQLReasoner:RacerProFACT+PelletJenaHermit(Horrocks 2008)Answers(XML/JSON/RDF/CSV)RDF Triples+Query20应用工作流Application WorkflowApplication WorkflowRule-
16、based Languages:Prolog,the logic programming languageLarKC application workflow(LarKC 2011)21医学知识图谱实例:抑郁症知识图谱抑郁症知识图谱集成了抑郁症相关的各类知识与数据资源。为研究抑 郁症相关数据的各种关系及其临床决策支持提供数据基础支持。Knowledge Graphs of Depression(DepressionKG)is a set of integrated knowledge/data sources concerning depression.It provides the data
17、 infrastructure which can be used to explore the relationship among various knowledge/data sources of depression and support for clinical/medical decision support systems.Knowledge Graphs of Depression抑郁症知识图谱抑郁症知识图谱采用语义技术标准格式RDF Ntriple来实现其知识表达 DepressionKG is represented with the format RDF/NTriple
18、,a semantic webstandard.Knowledge/Data Resources抑郁症知识图谱数据来源 Medical Guidelines of Depression抑郁症指南 Clinical Trials of Depression抑郁症临床试验 PubMed/Medline on Depression 医学文献 Wikidepedia/DBPedia Antidepressant维基百科 Drugbank药物知识库 Drugbook(Drug Specification)药物说明书 SIDER(药物副作用知识库)SNOMED CT,UMLS,MeSH(临床概念术语集)I
19、CD10,DSM V(疾病分类及其诊断标准)Gene ontology,Protein ontology,.(生命科学基础数据集)抑郁症知识图谱(0.6版)DepressionKG(version 0.6)抑郁症知识图谱:数据整合DepressionKG:Data IntegrationExample 实例 Patient C,an adult male,suffers from mood disorder and hopes to try a clinical trial on depression.His clinical doctor wants to find an on-going
20、trial which uses a drug intervention with target neurotransmitter transporter activity.This requires a search that covers both DrugBank and ClinicalTrial.语义查询 Semantic Query更多复杂的例子More Comprehensive Examples更多复杂的例子More Comprehensive Examples抑郁症知识图谱应用实例:药物不良反应分析 有二分之一到三分之二的抑郁症患者在初次治疗的前半年里停药,其中部分患者停药
21、的原因是无法耐受药物不良反应。在临床决策过程中,准确地把握和分析抗抑郁药物的不良反应成为了临床医生所面临 的一个重要环节。一个抗抑郁药不良反应通常有几十种,临床工作中,医生既要兼顾疗效与不良反应之 间的平衡,又要顾及患者躯体疾病治疗,难以快速、准确的做出最优决策。使用结构化和语义化的知识图谱能够提高临床分析的效率和准确度。药物不良反应发生频率词Frequency Measure of Side Effect Very common很常见(1/10),Common常见(1/100至1/10),Uncommon少见(1/1000至1/100),Rare罕见(1/10,000至1/1000),Very rare非常罕见(1/10000),Unknown未知(不能通过已有的数据估计)联合用药不良反应的预测(1)功能复合模型:从现有的单个药物可能产生的不良反应的信息中推出多种药物联合用药的可能出现的不良反应(2)精细综合模型:考虑病人的不同用药情况及其特征(如年龄等),进行综合分析,提出预测分析结果。(3)个性化分析模型。从病人的历史记录和药物反应机制方面来获得病 人的受体敏感性的信息,作出更有针对性的预测。联合用药不良反应预测Alprazolam阿普唑仑阿普唑仑Lithium carbonate碳酸锂碳酸锂Quetiapine思瑞康思瑞康 Sodium Valproate德巴金德巴金