1、NCBI FieldGuideGenome ResourcesNCBI FieldGuideGenomic BiologyNCBI FieldGuideNCBI FieldGuideNCBI FieldGuideNCBI FieldGuideGenome Projects:microbNCBI FieldGuideNCBI FieldGuide=scale for scoring systemNegative for less likely substitutionsOther purine nucleotide metabolizing enzymes not found by ordina
2、ry BLASTMus musculusOrganismG 0-2 0-1-3-2-2 6nucleotide,protein and translations(blastn,blastp,blastx,tblastn,tblastx)environmental samplefilter,e.GTACTGGACATSbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333catalytic loopG 0-2 0-1-3-2-2 6Word size=3(default)refseq_genomic=NC_*,NG_*Entrez
3、 GeneEntrez linksDrosophila melanogaster CG18582-PA(mbt)mRNA,(3244 bp)Why Do We Need Sequence Similarity Searching?UniGene Cluster Hs.Service Addresses95351:seqsNCBI FieldGuideGenome ResourcesNCBI FieldGuide451 N -4-3 8-1-5-2-2-3-1-6-6-2-4-5-4-1-2-6-4-5Nucleotide only:-e 10000-v 2000ATCACCATGAAGTGGC
4、TGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGACBasic Local Alignment Search Toolbiomol mrnaPropertiesSbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333RESULTS:Initial BLASTPSpecies Number of genesW=11,t=18,non-coding:111010010110010111Make lookup table of“words”for
5、querySsc_UniGenevs DrosophilaWhitehead-RHGTQITVEDLFYNIATRRKALKNSerine/Threonine protein kinasesGTACTGGACATGGACCCTACAGGAALKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEIA single query interface to SequencesSequences-RefSeqs-RefSeqs-GenBank-GenBank-Homologene-HomologeneMaps MapViewe
6、rMaps MapViewerEntrez linksEntrez linksLocusLinkLocusLink will be replaced by Entrez Gene on MARCH 1,2005.Check Gene FAQ for current information.NCBI FieldGuideEntrez GeneLocusLinkA single query interface to SequencesSequences -RefSeqs -RefSeqs -GenBank -GenBank -Homologene -HomologeneMaps MapViewer
7、Maps MapViewerEntrez linksEntrez linksEntrez Gene More organisms-all RefSeq genomes Entrez integrationNCBI FieldGuideGsnsym淀粉样变性病NCBI FieldGuideNCBI FieldGuideGlobal Entrez:NADH2nadh247NCBI FieldGuideEntrez Gene:NADH226 recordsNCBI FieldGuideGene Record for Pongo NADH2Homo sapiensNCBI FieldGuideDisp
8、lay Exons/Introns:Gene TableNCBI FieldGuideGene TableNCBI FieldGuideA Record With More Data:Human HFE血色沉着病NCBI FieldGuideGene Graphic LinksNM_NM_NP_NP_NCBI FieldGuideIntrons/Exons:Gene Tablelinks to sequenceNCBI FieldGuideGga_UniGeneFinds best local alignmentsGTACTGGACAT441 A 3-2 1-2 0-1 0 1-2-2-2 0
9、-1-2 3 1 0-3-3 0H-2 0 1-1-3 0 0-2 8The New HomologeneJust below threshold,anotherSsc_UniGenesp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN(P67)Other BLAST AlgorithmsYLS HFLIdentity matrix450 K 0 3 0 1-5 0 0-4-1-4-3 4-3-2 2 1-1-5-4-4Trace ArchiveStrongylocentrotus purpuratus Trace
10、sMegaBLAST=“No significant similarity found.Nucleotide only:gray line=same database hitProvides statistical significancecatalytic loopBasic Local Alignment Search ToolA Record With More Data:Human HFENCBI FieldGuideEntrez SNPhfegene name AND humanorgn 52血色沉着病NCBI FieldGuideLinking to SNP染色体定位基因定位序列定
11、位NCBI FieldGuideSNP in StructureNCBI FieldGuideLink to OMIMNCBI FieldGuideVariants in OMIMNCBI FieldGuideGenome ResourcesNCBI FieldGuideACATGGACCCTGTQITVEDLFYNI=scale for scoring systemgi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE(ADENOSINE435 K -1 0 0-1-2 3 0 3 0-2-2 1-1-1-1-1-1-1-1-2The New Ho
12、mologeneBLOSUM62-default matrix for BLASTTrade-off:sensitivity vs speedSsc_UniGeneRAG1 HomologeneDiscontiguous(Cross-species)MegaBLASTDisplay Exons/Introns:Gene TableMammaliaOrganismbiomol mrnaPropertiesUniGene Cluster Hs.sp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN(P67)GTQITVE
13、DLFYNILimit by taxonEntrez Gene:NADH2Mus musculusOrganismGene-oriented clusters of expressed sequences Automatic clustering using MegaBlast Each cluster represents a unique gene Informed by genome hits Information on tissue types and map locations Useful for gene discovery and selection of mapping r
14、eagentsUniGeneNCBI FieldGuideSeq1:1 W-HEREISWALTERNOW 16No longer UniGene basedoptimized for large batch searchesL-1-2-3-4-1-2-3-4-3 2 4GAxxxxGKSTnucleotide metabolism proteinsCheck to add to PSSMcatalytic loopcheck primer specificity in silicoIdentity matrixWord size can only be 2 or 3-f 11=blastp
15、default W=12,t=21,coding:100101101101100101101BLOSUM62-default matrix for BLASTGlobal vs Local Alignment452 I -3-5-5-6 0-5-5-6-5 6 2-5 2-2-5-4-3-5-3 3GTQITVEDLFYNIDerived from observation;small dataset of alignmentsStandard BLASTG 0-2 0-1-3-2-2 6A Cluster of ESTsquery5 EST hits3 EST hitsNCBI FieldGu
16、ideUnigeneNCBI FieldGuideUniGene CollectionsNCBI FieldGuideExample UniGene ClusterNCBI FieldGuideHistogram of cluster sizes for UniGene Hs build 177NCBI FieldGuideX 0-1-1-1-2-1-1-1-1-1-1-1-1-1-2 0 0-2-1-1-1458 L -3-1 0-3 0-3-2 3-4-2 3 0 1 1-2-2-3 5-1-3Discontiguous MegablastLTV,MTV,ISV,LSV,etc.A R N
17、 D C Q E G H I L K M F P S T W Y VJust below threshold,anotherHuman Genome BLAST:Resultsvery sensitive protein searchDiscontiguous megaBLAST=Gene databaseGeneral HelpProtein vs DNA translation tblastn446 D -4-4-1 8-6-2 0-3-3-5-6-3-5-6-4-2-3-7-5-5H-2 0 1-1-3 0 0-2 8441 A 3-2 1-2 0-1 0 1-2-2-2 0-1-2 3
18、 1 0-3-3 0=scale for scoring systemHomolgene:RAG1W=word size;#matches in templateGTQITVEDLFYNISequence View (sv)UniGene Cluster Hs.95351NCBI FieldGuideUniGene Cluster Hs.95351NCBI FieldGuideUniGene Cluster Hs.95351:expressionNCBI FieldGuideUniGene Cluster Hs.95351:seqsNCBI FieldGuideDownload sequenc
19、esweb pageftp siteNCBI FieldGuideGenome ResourcesNCBI FieldGuideThe New HomologeneAutomated detection of homologs among the annotated genes of completely sequenced eukaryotic genomes.No longer UniGene based Protein similarities first Guided by taxonomic tree Includes orthologs and paralogsNCBI Field
20、Guide Orthologs 和 Paralogs 是同源序列的两种类型。Orthologs(垂直同源基因)是指来自于不同物种的由垂直家系(物种形成)进化而来的蛋白,并且典型的保留与原始蛋白有相同的功能。Paralogs(平行同源基因)是那些在一定物种中的来源于基因复制的蛋白,可能会进化出新的与原来有关的功能。请参考文献获得更多的信息。NCBI FieldGuidegene duplicationParalogs vs Orthologsearly globin geneA-chain gene B-chain genefrog A chick A mouse Amouse B chick
21、B frog Bparalogsorthologs orthologsNCBI FieldGuideThe New Homologene Homologene Build 37.2Species Number of genes input grouped groupsNCBI FieldGuideGlobal vs Local AlignmentHuman Genome BLAST:Resultsreverse primerBasic Local Alignment Search ToolSeq1:1 W-HEREISWALTERNOW 16GTQITVEDLFYNIreverse prime
22、rAll combinations(DNA/Protein)query and database.“hemochromatosis”Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333-Genetic Maps-W=word size;#matches in templateStandard BLASTMegaBLAST vs Discontiguous MegaBLASTProtein BLAST requires two neighboring matches within 40 aaX 0-1-1-1-2-1-1-1-
23、1-1-1-1-1-1-2 0 0-2-1-1-1-b 2000alignmentsA Record With More Data:Human HFEWayne Matten456 N 1 1 3 0-4-1 1 0-3-4-4 3-2-5-2 2-2-5-4-4RAG1 Homologenerag112recombination activating gene NCBI FieldGuideRAG1 HomolgeneRAG1Amniota NCBI FieldGuideHomolgene:RAG1NCBI FieldGuideNCBI FieldGuideHomolgene:RAG1NCB
24、I FieldGuideGenome ResourcesNCBI FieldGuideNCBI FieldGuideNCBI FieldGuideNCBI FieldGuideMapViewerNCBI FieldGuideList ViewNCBI FieldGuideMore info:Ex:BLASTNMammaliaOrganismGenome ResourcesMegaBLAST=“No significant similarity found.Genome ResourcesStandford-G3441 A 3-2 1-2 0-1 0 1-2-2-2 0-1-2 3 1 0-3-
25、3 0|raw score=19-9=10Genome ResourcesProvides statistical significanceDisplay Exons/Introns:Gene TableMus musculusOrganismK=scale for search spacegov/gene/DATA/gene_info.Species Number of genesUniGene Cluster Hs.A+1 3 3-3Human MapVieweradar腺甙脱氨酶NCBI FieldGuideMapViewer:Human ADAR4NCBI FieldGuideMV H
26、s ADAR3 UTR5 UTRNCBI FieldGuideMaps&Options-Sequence mapsSequence maps-Ab initioAssemblyRepeatsBES_CloneCloneNCI_CloneContigComponentCpG islanddbSNP haplotypeFosmidGenBank_DNAGenePhenotypeSAGE_TagSTSTCAG_RNATranscript(RNA)Hs_UniGeneHs_EST-Cytogenetic mapsCytogenetic maps-IdeogramFISH CloneGene_Cytog
27、eneticMitelman BreakpointMorbid/Disease-Genetic Maps-deCODEGenethonMarshfield-RH maps-GeneMap99-G3GeneMap99-GB4NCBI RHStandford-G3TNGWhitehead-RHWhitehead-YACMm_UniGeneMm_ESTRn_UniGeneRn_ESTSsc_UniGeneSsc_ESTBt_UniGeneBt_ESTGga_UniGeneGga_ESTVariationMaps&Options=SNPNCBI FieldGuideMapViewerUniGeneCo
28、mponentRepeatsGeneNCBI FieldGuideMaster map:repeatsNCBI FieldGuideGenePhenotypeVariationNCBI FieldGuideW=12,t=18,coding:101101101101101101GAxxxxGKSTCAGGTAGCAAGCTTGCATGTCATrace Archiverecombination activating geneUniGene Cluster Hs.446 D -4-4-1 8-6-2 0-3-3-5-6-3-5-6-4-2-3-7-5-5DNA translation vs DNA
29、translation tblastxK=scale for search spaceGenomic Biology2 bits(92),Expect=0.Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333refseq_genomic=NC_*,NG_*441 A 3-2 1-2 0-1 0 1-2-2-2 0-1-2 3 1 0-3-3 0BLAST statisticsray finned fishesOrganismParticularly useful for nucleotide sequences withou
30、t440 P -2-2-2-2-3-2-2-2-2-1-2-1 0-3 7-1-2-3-1-1CATGCTTAATTAll combinations(DNA/Protein)query and database.PAM250 widely usedMaps&OptionsMaps&OptionsNCBI FieldGuideGenome ResourcesNCBI FieldGuideNCBI FieldGuideNCBI FieldGuideStrongylocentrotus purpuratus TracesNCBI FieldGuideBasic Local Alignment Sea
31、rch ToolNCBI FieldGuideWeb AccessBLASTVASTEntrezTextSequenceStructureNCBI FieldGuideNCBI FieldGuideBasic Local Alignment Search Tool Why use sequence similarity?BLAST algorithm BLAST statistics BLAST output ExamplesNCBI FieldGuideWhy Do We Need Sequence Similarity Searching?To identify and annotate
32、sequences To evaluate evolutionary relationships Other:model genomic structure(e.g.,Spidey)check primer specificity in silico:NCBIs toolNCBI FieldGuideBLAST Website StatsNCBI FieldGuideGlobal vs Local AlignmentSeq 1Seq 2Seq 1Seq 2Global alignmentLocal alignmentNCBI FieldGuideGlobal vs Local Alignmen
33、tSeq1:WHEREISWALTERNOW (16aa)Seq2:HEWASHEREBUTNOWISHERE(21aa)GlobalSeq1:1 W-HEREISWALTERNOW 16 W HERE Seq2:1 HEWASHEREBUTNOWISHERE 21LocalSeq1:1 W-HERE 5 Seq1:1 W-HERE 5 W HERE W HERESeq2:3 WASHERE 9 Seq2:15 WISHERE 21NCBI FieldGuideThe Flavors of BLAST Standard BLAST traditional“contiguous”word hit
34、 position independent scoring nucleotide,protein and translations(blastn,blastp,blastx,tblastn,tblastx)Megablast optimized for large batch searches can use discontiguous words PSI-BLAST constructs PSSMs automatically;uses as query very sensitive protein search RPS BLAST searches a database of PSSMs
35、tool for conserved domain searchesNCBI FieldGuide Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations(DNA/Protein)query and database.DNA vs DNA blastn DNA translation vs Protein blastx
36、Protein vs Protein blastp Protein vs DNA translation tblastn DNA translation vs DNA translation tblastx www,standalone,and network clientsBasic Local Alignment Search ToolNCBI FieldGuideTranslated BLASTQueryQueryDatabaseDatabaseProgramProgramNPucleotideroteinNNNNPPblastxtblastntblastxPPPPPPPPPPPPPPP
37、PPPPPPPPPParticularly useful for nucleotide sequences withoutprotein annotations,such as ESTs or genomic DNANCBI FieldGuideHow BLAST Works Make lookup table of“words”for query Scan database for hits Ungapped extensions of hits(initial HSPs)Gapped extensions(no traceback)Gapped extensions(traceback;a
38、lignment details)NCBI FieldGuideNucleotide WordsGTACTGGACATGGACCCTACAGGAAQuery:GTACTGGACAT TACTGGACATG ACTGGACATGG CTGGACATGGA TGGACATGGAC GGACATGGACC GACATGGACCC ACATGGACCCTMake a lookuptable of words11-mer.828megablast711blastnminimumdefaultWORD SIZENCBI FieldGuideProtein WordsGTQITVEDLFYNIATRRKAL
39、KNQuery:Neighborhood WordsLTV,MTV,ISV,LSV,etc.GTQ TQI QIT ITV TVE VED EDL DLF .Make a lookuptable of wordsWord size=3(default)Word size can only be 2 or 3-f 11=blastp default NCBI FieldGuideMinimum Requirements for a Hit Nucleotide BLAST requires one exact match Protein BLAST requires two neighborin
40、g matches within 40 aaGTQITVEDLFYNI SEI YYNATCGCCATGCTTAATTGGGCTT CATGCTTAATT neighborhood wordsone exact matchtwo matches-A 40=blastp default NCBI FieldGuideBLASTP Summary YLS HFLSbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333 Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47
41、Gapped extension with trace backGapped extension with trace backQuery 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV 50 +E YA YL K F+YLSL+SP+DVNVHP+K VHFL+I+Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI 337 Final HSPFinal HSP +E YA YL K F+L+SP+DVNVHP+K V +I High-scoring pair(H
42、SP)High-scoring pair(HSP)HFL 18HFV 15 HFS 14HWL 13NFL 13DFL 12HWV 10etc YLS 15YLT 12 YVS 12YIT 10etc Neighborhood Neighborhood wordswordsNeighborhood Neighborhood score thresholdscore thresholdT(-f)=11T(-f)=11Query:IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEVexample query wordsexample query wo
43、rdsNCBI FieldGuideScoring Systems-Nucleotides A G C TA+1 3 3-3G 3+1 3-3C 3 3+1-3T 3 3 3+1Identity matrixCAGGTAGCAAGCTTGCATGTCA|raw score=19-9=10CACGTAGCAAGCTTG-GTGTCA-r 1-q-3 NCBI FieldGuideScoring Systems-ProteinsPosition Independent MatricesPAM Matrices(Percent Accepted Mutation)Derived from obser
44、vation;small dataset of alignments Implicit model of evolution All calculated from PAM1 PAM250 widely usedBLOSUM Matrices(BLOck SUbstitution Matrices)Derived from observation;large dataset of highly conserved blocks Each matrix derived separately from blocks with a defined percent identity cutoff BL
45、OSUM62-default matrix for BLASTPosition Specific Score Matrices(PSSMs)PSI-and RPS-BLASTNCBI FieldGuideA 4R-1 5 N-2 0 6D-2-2 1 6C 0-3-3-3 9Q-1 1 0 0-3 5E-1 0 0 2-4 2 5G 0-2 0-1-3-2-2 6H-2 0 1-1-3 0 0-2 8I-1-3-3-3-1-3-3-4-3 4 L-1-2-3-4-1-2-3-4-3 2 4K-1 2 0-1-3 1 1-2-1-3-2 5M-1-1-2-3-1 0-2-3-2 1 2-1 5F
46、-2-3-3-3-2-3-3-3-1 0 0-3 0 6P-1-2-2-1-3-1-1-2-2-3-3-1-2-4 7S 1-1 1 0-1 0 0 0-1-2-2 0-1-2-1 4T 0-1 0-1-1-1-1-2-2-1-1-1-1-2-1 1 5W-3-3-4-4-2-2-3-2-2-3-2-3-1 1-4-3-2 11Y-2-2-2-3-2-1-2-3 2-1-1-2-1 3-3-2-2 2 7V 0-3-3-3-1-2-2-3-3 3 1-2 1-1-2-2 0-3-1 4X 0-1-1-1-2-1-1-1-1-1-1-1-1-1-2 0 0-2-1-1-1 A R N D C Q
47、 E G H I L K M F P S T W Y V XBLOSUM62DFNegative for less likely substitutionsDYFPositive for more likely substitutionsNCBI FieldGuidePosition-Specific Score MatrixDAF-1Serine/Threonine protein kinases catalytic loop174PSSM scores54NCBI FieldGuide A R N D C Q E G H I L K M F P S T W Y V 435 K -1 0 0
48、-1-2 3 0 3 0-2-2 1-1-1-1-1-1-1-1-2 436 E 0 1 0 2-1 0 2-1 0-1-1 0 0 0-1 0 0-1-1-1 437 S 0 0-1 0 1 1 0 1 1 0-1 0 0 0 2 0-1-1 0-1 438 N -1 0-1-1 1 0-1 3 3-1-1 1-1 0 0-1-1 1 1-1 439 K -2 1 1-1-2 0-1-2-2-1-2 5 1-2-2-1-1-2-2-1 440 P -2-2-2-2-3-2-2-2-2-1-2-1 0-3 7-1-2-3-1-1 441 A 3-2 1-2 0-1 0 1-2-2-2 0-1-
49、2 3 1 0-3-3 0 442 M -3-4-4-4-3-4-4-5-4 7 0-4 1 0-4-4-2-4-1 2 443 A 4-4-4-4 0-4-4-3-4 4-1-4-2-3-4-1-2-4-3 4 444 H -4-2-1-3-5-2-2-4 10-6-5-3-4-3-2-3-4-5 0-5 445 R -4 8-3-4 0-1-2-3-2-5-4 0-3-2-4-3-3 0-4-5 446 D -4-4-1 8-6-2 0-3-3-5-6-3-5-6-4-2-3-7-5-5 447 I -4-5-6-6-3-4-5-6-5 3 5-5 1 1-5-5-3-4-3 1 448
50、K 0 0 1-3-5-1-1-3-3-5-5 7-4-5-3-1-2-5-4-4 449 S 0-3-2-3 0-2-2-3-3-4-4-2-4-5 2 6 2-5-4-4 450 K 0 3 0 1-5 0 0-4-1-4-3 4-3-2 2 1-1-5-4-4 451 N -4-3 8-1-5-2-2-3-1-6-6-2-4-5-4-1-2-6-4-5 452 I -3-5-5-6 0-5-5-6-5 6 2-5 2-2-5-4-3-5-3 3 453 M -4-4-6-6-3-4-5-6-5 0 6-5 1 0-5-4-3-4-3 0 454 V -3-3-5-6-3-4-5-6-5
侵权处理QQ:3464097650--上传资料QQ:3464097650
【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。