1、IBS及抑郁症相关肠道菌群的基因组学方法及若干研究进展Research focuses of my lab Genome and protein sequence analysis and proteinstructure analysis Bioinformatics methods of metagenomic analysisassociated with human health and environments Clinical data analysis and medicine informaticsIntestinal Microbiota-“Forgotten”Human O
2、rgan Gut microbes:1014 cells cellnumbers10 times larger thanhuman;encode gene number 150times larger than humangenome(Baquero,2012).Influence alterations in host energybalance and immunity(Sanz,2014).Mysterious:80%gut microbes cantbe cultured in lab(Kellenberger,2001).1.Baquero F,Nombela C.Clinical
3、Microbiology and Infection,2012,18(s4):2-4.2.Sanz Y,Olivares M,Moya-Prez,et al.Pediatric research,2014.3.E.Kellenberger.EMBO reports.2001,2(1):57Hot Research Field 2008,NIH initiated HMP(Human Microbiome Project).2008,European Commission initiated MetaHIT(Metagenomics of the Human Intestinal Tract c
4、onsortium).Publication number related to intestinal microbiota(Sekirov,2009)1.Qin J,Li R,Raes J,et al.Nature,2010,464(7285):59-65.2.Nelson K E et al.Science(New York,NY),2010,328(5981):994.3.Gut Microbiota in Health and DiseaseGut Microbiota and Host Disease Relationship:a).Dysbiosis triggers pathog
5、enesis.b).Dysbiosis arises in parallel withpathogenesis.c).Disease causes shift in gut florastructure.d).Dysbiosis aggravates disease.Significance of Gut Flora Dysbiosis(Frank,2011)1.Frank D N,Zhu W,Sartor R B,et al.Investigating the biological and clinical significance of human dysbiosesJ.Trends in
6、 microbiology,2011,19(9):427-434.Gut Microbiota and Host DiseaseGut Microbes and Diabetes(Sanz,2014)Gut Microbes and Obesity&Metabolic Syndrome(Devaraj,2013)1.Sanz Y,Olivares M,Moya-Prez,et al.Understanding the role of gut microbiome in metabolic disease riskJ.Pediatric research,2014.2.Devaraj S,Hem
7、arajata P,Versalovic J.The human gut microbiome and body metabolism:implications for obesity and diabetesJ.Clinical chemistry,2013,59(4):617-628.Metagenomics:Herein defined as the application of sequencing toDNA obtained from environmental or humanmicrobial samples.-Bypass the limitation of pure cul
8、tures sequencing-Short DNA reads by shotgun sequencing,especially byNGSSeveral metagenome sequencing projects in 2008(Hugenholtz et.al.2010)Metagenomics:method to analyze gutmicrobiotaSample CollectionDNA FilterDNA ExtractionPCRSequencingMetagenome Sequencing Progress(Wooley,2010)Metagenome Analysis
9、 Progress(Kunin,2008)1.Wooley J C,Godzik A,Friedberg I.A primer on metagenomicsJ.PLoS computational biology,2010,6(2):e1000667.2.Kunin V,Copeland A,Lapidus A,et al.A bioinformaticians guide to metagenomicsJ.Microbiology and Molecular Biology Reviews,2008,72(4):557-578.Life scientists are starting to
10、 grapple with massive data sets,encountering challenges with handling,processing and movinginformation that were once the domain of astronomers and high-energy physicists.Nature 498(13 June 2013)The last week of April was designated Big Data Week.But inmodern biology,every week is big-data week.Natu
11、re 499(4 July 2013)Short read data by NGS:bioinformatics challengesHiggins G.,Human Genomes andBig Data Challenges,2013,AssureRx Health Inc.Bioinformatics issues formetagenomics-Sequencing-Reads preprocessing-Short reads assembly-Gene prediction andannotation-Composition estimates-Binning/classifica
12、tion-Population analysis-Gene-centric analysis-Data compressionCurrent works in bioinformatics methods metagenomes-DNA short reads assembly methods:MAP and IntegMAP(Bioinformatics,Zhu*,2012;BMC Bioinformatics,Zhu*,2015)-Gene prediction methods:MetaTISA(Bioinformatics,Zhu*,2009),MetaGUN(BMC Bioinform
13、atics,Zhu*,2013)-Metagenomic sample comparison tool:MetaComp(2015)De Novo Assembly Methods for DNAShort Reads in MetagenomesSequence assembly plays an essential role in themetagenomics-Assemble short reads(25-1000 bp)into longer contigs(from 102bp to the whole chromosome)in order to provide more val
14、uablegenomic content,which is essential for downstream analysis suchas gene finding and functional annotation.The information contained in different lengths of genomic DNAReference genome-based methods:-The comparative assembly approach such as AMOS uses areference genome or closely related species
15、to align reads,wasapplied to facilitate assembly of short reads-The potential bias caused by phylogenetic complexity anddiversityDe novo methods:-The de novo assembly methods are still regarded asirreplaceable tools for accurately assembling the novel genomicsequences that broadly exist in the metag
16、enomic sequencing dataMAP(Metagenome Assembly Program)forSanger and 454 sequencing reads-A de novo assembly approach based on an improvedoverlap/layout/consensus(OLC)strategy incorporated withseveral special algorithms-Use the mate pair information,resulting in being moreapplicable to shotgun DNA re
17、ads currently widely used inmetagenome projects.http:/ of MAPAssembly results of MAP on simulated Sanger reads(800 bp)Assembly results of MAP on simulated 454 mate pair reads(200 bp)Results of extensive tests on simulated data showthat MAP can be superior to both Celera and Phrapfor typical longer r
18、eads by Sanger sequencing,aswell as has an evident advantage over Celera,Newbler and the newest Genovo,for typical shorterreads by 454 sequencing.IntegMAP(Integrated MetagenomicAssembly Pipeline)for short reads by NGS-Developed a de novo pipeline,IntegMAP,for integratingindividual current assemblers
19、 that complemented the advantageseach in assembling metagenomic sequences-ABySS(Simpson et al.,2009)-CABOG(Miller et al.,2008)-IDBA-UD(Peng et al.,2012)-MetaVelvet(Namiki et al.,2012)-SOAPdenovo(Li et al.,2010)Flowchart of IntegMAP High coverage ABySS IDBA-UD Low coverage IDBA-UD CABOGComparison of
20、IntegMAP and other assemblies onsimulated metagenomic datasetTotalcoverCorr.N-Corr.N-E-sizelen at 10 len at 50(bp)Num.of TotalKbp/Identityerrors(%)coveredgeneserrorslength Mbp(bp)Mbp(bp)(Mbp)ABySS,k=31ABySS,k=61Bambus2CABOGIDBA-UDMetaVelvet,k=23MetaVelvet,k=61SOAPdenovo,k=23SOAPdenovo,k=61163.8 185,
21、12285.5 222,5813,748 11,4664,192 15,3952,370 6,5315,713 10,1428,092 14,65142,37633,99740,139 259,32011,6546,71914.112.70.998.642.155.999.899.999.599.899.799.899.999.899.9232.590,788244.8 139,195227.9 222,63147,96867,71323,97126,74714,25324,0812,4825,4163,271251 304.11,717 118.3182.85,4371,2749346898
22、,62834576.3 121,245203.075.22,11689,8118796716,0781,92139.1Only contigs with length 200 bp are considered.“k=23”,“k=31”and“k=61”in the first columndenote the assembler use the option of k-mer size at 23 bp,31 bp and 61 bp.Bambus 2 uses unitigsfrom CABOG.Total cover length denotes the total length of
23、 reference sequences that are covered bycontigs.Corr.N-len denotes the corrected N-len size.E-size is also computed using corrected contigs.Only complete covered genes are counted.Errors denote the structural errors in contigs.The errorrate is measured as the average distance between errors.Identity
24、 denotes the average identity of thealignments between contigs and references,where unmapped segments of contigs are not considered.Values in bold indicate the best in the column.Assembly statistics and predicted gene number onhuman gut microbial metagenome dataset(SampleMH0012).Sum ofcontiglength(M
25、bp)158.7N-len at 5Mbp(bp)N-len at E-size(bp)Non-redundantNum.ofpredictedcompleteORFs50 Mbp(bp)ORFs predictedABySS,k=61aBambus2215,12550,90546,459177,46820,4698,90318,3666,427184,441336,604222,638339,336112,237226.6185.6277.2184,683119,907186,427CABOG12,73841,8316,964IDBA-UD23,970SOAPdenovoOnly conti
26、gs 500 bp are considered.Bambus 2 uses unitigs from CABOG.“k=51”denotes that237.434,5188,6795,166306,657135,644Met(aQVinelevteatl.u2s0e1s0o)bption of k-mer size 51 and“k=31”denotes that MetaVelvet uses option of k-mersize 31.The assembly generated by Qin et al(2010)is included,which is assembled byI
27、ntegMAP278.6242,60848,30339,156339,598186,997SOAPdenovo.In the column of non-redundant ORFs predicted,only ORFs 100 bp are counted.Last column lists the number of complete ORFs.The ORFs are predicted by MetaGeneMark(Zhu etal.2010).Values in bold indicate the best in the column.aAssembly by ABySS was
28、 generated from the corrected reads from which many low coverage readsmay be excluded because we failed ran ABySS on the mixed reads.bAssembly by SOAPdenovo was directly downloaded from the publication of Qin et al.(2010).Taking advantage of the strength of each assemblerand the complementary among
29、them,the IntegMAPpipeline improves largely in the metagenomicassembly performance by improving assemblies onall sequencing depth levels.Compared with individual assemblers on bothsynthetic and real NGS metagenomic dataset,IntegMAP demonstrates its better performance ofgenerating assembly for both co
30、ver length andcontiguity with a high accuracy,in assembling NGSmetagenomic data.Ab initio Gene Prediction inMetagenomic DNA FragmentsAccurately identifying genes from metagenomicfragments is one of the most fundamental issues Most fragments are very short.Many sequences in metagenomicsequencing proj
31、ects remain as unassembled reads or short-lengthcontigs.Therefore,lots of genes are incomplete with one or twoends exceed the fragments.Also,a single fragment usually containsonly one or two genes,non-supervised methods for single genomeswhich require many genes for model training are inapplicable f
32、orthis situation.The anonymous sequence problem,which means the sourcegenomes of the fragments are always unknown or totally new,brings challenge on statistical model construction and featureselection.Evidence-based methodAb initio method-Evidence-based methods rely on homology searches includingcom
33、parisons against known protein databases by BLASTpackages,CRITICA and Orpheus.-Evidence-based methods can infer functionalities and metabolicpathways of the predicted genes via significant targets with ahigh specificity.-However,only the genes with previously known homologs canbe predicted by eviden
34、ce-based methods,while the novel genes,which are very important to metagenomic studies,will beoverlooked.-Therefore,ab initio algorithms that can present much highersensitivity along with sufficient high specificity are indispensible.MetaGUN:gene prediction for metagenomicfragments based on SVM algo
35、rithmImplements by multi-strategy to predict genes:-Classifies input fragments into phylogenetic groups by a k-merbased sequence binning method.-Identifies protein coding sequences for each groupindependently with SVM classifiers that integrate entropy densityprofiles(EDP)of codon usage,translation
36、initiation site(TIS)scores and open reading frame(ORF)length as input patterns.Then adjust TISs by employing MetaTISA.Flowchart of MetaGUNInput withmetagenomicdataTo identify protein-codingsequences,MetaGunbuilds the universalmodule and the novel genemodule.The former isbased on a set ofBinning base
37、d on k-RPS-BLAST formersdomainUniversalNovel genemodulemodulerepresentative species,while the latter is designedto find potentialORFindentificationfunctionary DNAsequences with conserveddomains.TIS relocating byMetaTISAOutputMetaGUNs performance on simulated metagenomic data Simulated metagenomic sh
38、ort-gun sequences Simulated fragments from 50 prokaryotic genomes 4 kinds of read-length Sn=TP/(TP+FN),Sp=TP/(TP+FP),Hm=2SnSp/(Sn+Sp)For longer fragments,MetaGUN has better performancethan all other toolsApplication to human gut microbiome samples Two samples of human gut microbiome from two healthy
39、 humans(Gillet.al.2006 Science)Potential novel genes:A:Genes with e10-5 searched in CDD database B:Genes annotated by IMG/M C:Genes with e10-5searched in NCBI NR database Potential novel genes:A-B-CSupporting findings for predicted novel genes-infB:corresponds to translation initiation factor IF-2,w
40、hich isdifferent from the similar proteins in the Archaea and Eukaryotesand acts in delivering the initiator tRNA to the ribosome-PRK12678:corresponds to transcriptional terminator factor Rho;-Several domains from DNA polymerase like PRK05182,PRK12323.http:/ axis disorder andIntestinal microbiologyZ
41、hu LabDuan LabIrritable bowel syndrome(IBS)AbnormalMotilityHigh PrevalenceHigh VisceralSensitivityIntestinalInflammationPsychologicalFactorsLow Cure Rate1.Mayer,Emeran A.,Tor Savidge,and Robert J.Shulman.Braingut microbiome interactions and functional bowel disorders.Gastroenterology146.6(2014):1500
42、-1512.IBS and Gut Microbiota MicroecologyThe imbalance of intestinal micro-ecology in IBS patients.The ratio of Firmicutes/Bacteroidetes is significantly changed.1.Jeffery,Ian B.,et al.An irritable bowel syndrome subtype defined by species-specific alterations in faecal microbiota.Gut 61.7(2012):997
43、-1006.2.Carroll,Ian M.,et al.Alterations in composition and diversity of the intestinal microbiota in patients with diarrheapredominant irritable bowelsyndrome.Neurogastroenterology&Motility 24.6(2012):521-e248.3.RajiliStojanovi,Mirjana,et al.Global and deep molecular analysis of microbiota signatur
44、es in fecal samples from patients with irritable bowelsyndrome.Gastroenterology 141.5(2011):1792-1801.Mental Disorder Accompanied with IBSIllness TypeDetection Rate(%)(N=246)FDIBS44.639.8FD+IBSCIDI-3.0(MD)34.721.845.139.2Somatoform DisorderComorbidity Rate of IBS and Mental Disorder*Abbreviation:FD-
45、Functional Diarrhea PD-Personality Disorder MD-Mental Disorder*This survey is conducted by Department of Gastroenterology,Peking University Third HospitalDepression and Gut Microbiota MicroecologyThe imbalance of intestinal micro-ecology in Depression patients.The ratio of Bacteroidetes/Firmicutes i
46、s significantly changed.1.Finegold,Sydney M.,et al.Pyrosequencing study of fecal microflora of autistic and control children.Anaerobe 16.4(2010):444-453.Brain-Gut Axis DisorderABrain-Gut Axis:Bidirectional Interactions between Brain and Gut(Mayer,2014)1.Mayer,Emeran A.,Tor Savidge,and Robert J.Shulm
47、an.Braingut microbiome interactions and functional bowel disorders.Gastroenterology146.6(2014):1500-1512.Research Goal Compare the microbial community structure in IBS,Depression and Comorbidity patients gut.Clarify the gut microbes signature of patients.Identify pathogenic bacteria.Analyze correlat
48、ion between clinical symptoms and gut microbes.Explore new target for clinic treatment.Compare the functional difference of gut flora in IBS,Depression and Comorbidity patients gut.Clarify the correlation between gut flora metabolic function and hostdisease severity.Analyze causality of gut microbio
49、ta and disease.Explore new therapies targeted on regulating gut microbiota metabolicfunctions.Determine the importance of gut flora in the morbidity ofbrain-gut axis disorder related diseases.Gut FloraBrain-gut axis disorder andIntestinal microbiologyMicrobial structural dysbiosisMicrobial functiona
50、l disorderPatients Symptoms DescriptionMicrobial Structure VariationPhylum Distribution(*:P0.05)ARarefraction CurveBDSpecies AbundanceCFamily DistributionGroup Genus AbundanceHeatmap Analysis on Genus Abundance of 100 SamplesBacterial Taxa differ between IBS-D,Depression,COMO and Health ControlMean