1、基于基因集富集分析的畜禽复杂性状GWAS分析平台及其应用潘玉春 王起山2011.8.121一、背景2pGWAS全基因组关联分析方法(GWAS)是近几年提出的复杂性状功能基因鉴定的新策略。该方法是基于全基因组范围内的序列变异,筛选出那些与性状关联的SNPs。问题需要对数以万计的SNP位点进行检测,可能出现许多假阳性或假阴性的结果。缺乏对显著SNP的生物学解释(如所处的代谢通路、生物过程等),导致很难解释性状的分子遗传机制。3pGSEA-GWAS 基因集富集分析的基本思想是使用预定义的基因集(通常来自功能注释或先前实验的结果),筛选出与性状显著相关的通路等注释集合。引入基因集富集分析的方法比单基因
2、分析能获得更多、更有生物学意义的基因信息,将有助于解决上述两个问题。Mootha 等(2003)提出用于表达芯片的数据分析 Wang Kai 等(2007)扩展到GSEA-GWAS4GWAS using single marker based association test5GWAS分析结果选择GWAS结果显著的SNP 集合进行Fisher 检验显著的基因集基因集QTL映射 功能验证GSEA-GWAS 分析流程基因映射SNP基因集映射基因基因集定义选择GWAS结果中所有SNP进行富集分析6目前基于基因集富集分析的全基因组关联分析方法(GSEA-GWAS)已经应用于人类(随机无关人群或核心家系
3、)和小鼠(高度近交)的资源群体。http:/www.nr.no/pages/gseasnp (Bioinformatics 2008)https:/webtools.imbs.uni-luebeck.de/snptogo (Bioinformatics 2008)GSEA-GWAS 研究进展7二、畜禽GSEA-GWAS平台p牛基因组SNP功能注释与Fisher富集分析平台http:/ KEGG Pathway:转录调控、信号转导、代谢 Gene Ontology:cellular component,biological process,molecular function支持基因集p牛SNP
4、功能注释及Fisher富集分析平台9Homepage of SNPpathCluster for Computering SNPpath Web Server Path details Result Page Gene Ontology associated with traitsMySQL DatabaseGene Ontology分析流程10 Fishers精确概率法利用超几何分布的原理推断每个基因集中的差异表达基因的比例是否与整个基因芯片上差异表达基因的比例相同。差异表达SNP非差异表达SNP合计基因集注释的SNPaba+b非基因集注释的SNPcdc+d合计a+cb+da+b+c+d分析
5、原理1111分析实例Snelling et al.2010 Journal of Animal Science2013 animalsBovineSNP50 BeadChip(50K)assaybirth weight(BWT),BW gain from birth to weaning,adjusted to 205 days(WG),205-day adjusted weaning weight(WW),160-day adjusted postweaning BW gain(PWG)365-day adjusted yearling weight(YW).12Growth traitsP
6、athway IDPathway nameP valuebirth weight(BWT)00330Arginine and proline metabolism0.040500480Glutathione metabolism0.0213205-day preweaning gain(WG)00030Pentose phosphate pathway0.018400512O-Glycan biosynthesis0.0637160-day postweaning gain(PWG)04620Toll-like receptor signaling pathway0.076200330Argi
7、nine and proline metabolism0.040500480Glutathione metabolism0.0213205-day weaning weight(WW)00330Arginine and proline metabolism0.040500480Glutathione metabolism0.0213365-day yearling weight(YW)00030Pentose phosphate pathway0.018400330Arginine and proline metabolism0.040500480Glutathione metabolism0
8、.02131314发表论文15 KEGG Pathway:转录调控、信号转导、代谢 Gene Ontology:cellular component,biological process,molecular function the Pfam protein families database(Pfam)protein domains,families and functional sites(PROSITE)p猪、牛、鸡基因集富集分析平台支持基因集16Single column P-values,the data can be analyzed using Irizarrys method;
9、Two columns of P-values,the data can be analyzed using either the F test or t test;The program also accepts up to 5000 columns of P-values which can be analyzed using either the Efrons re-standardized.支持物种猪、牛、鸡分析方法17程序主页18注释集合19富集分析流程20Snelling et al.2010.J.Anim.Sci.88(3):837848.To illustrate the ap
10、plication of GWASknow,we analyzed the SNP data from the GWAS of growth in crossbred beef cattle which used BovineSNP50 BeadChip(50K)assay.Body weights(BW)gain from birth to weaning were utilized.分析实例21The analysis results by the restandardized method are shown.A total of 28 KEGG pathway gene sets ha
11、ving FDR-corrected p-values 0.01 were tabulated.22http:/animalgenome.org/cgi-bin/QTLdb/BT/summaryMany of the gene sets we identified made good biological senses.For example,The KEGG terms Selenoamino acid metabolism and O-Glycan biosynthesis overlap QTL described for average daily gain,body weight,feed conversion ratio.QTL映射23相关论文24Thanks!农业与生物学院动 物 科 学 系上海市兽医生物技术重点实验室