1、测序技术基础测序技术基础罗龙海罗龙海2009-03-022009-03-02 Sanger测序技术原理 第二代测序技术原理 第三代测序技术原理1950196019701980199020002010测序技术发展史测序技术发展史 Development of Sanger Sequencing(1977)Invention of Automated FluorescentSequencer(1985)Invention of CapillarySequencer(1996)Invention of Applied BiosystemsSolid System(2007)Invention of
2、Illumina Genome Analyzer System(2006)Invention of 454 GS 20 Sequencer(2005)chemical degradation method by Maxam-Gilbert method(1977)Chemical degradation method by Whitfield(1954)Invention of Heliscope single molecular sequencerInvention of Single molecule real time(SMRT)DNA sequencingInvention of Na
3、nopore single molecular sequencing(Oxford Nanopore corporation)Sanger 测序法原理测序法原理Dr.Fred Sanger dideoxy sequencing technique(Sanger et al.,1977)DNA双脱氧链终止法测序 Frederick Sanger was awarded the prize in both 1958 and 1980.He is the fourth person in the world to have been awarded two Nobel Prizes and the
4、only person to receive both in chemistry.Illumina Genome AnalyzerABI SOLiDRoche 454第二代测序技术二代测序技术二代测序技术200520062007年份年份原理原理Pyrosequencing边合成边测序边合成边测序边连接边测序边连接边测序454SolexaSOLIDIllumina SolexaABI SOLiDRoche 454第二代测序技术可逆阻断技术可逆阻断技术Illumina Solexa FlowcellFlowcell一个一个 flowcell 包括包括8个个lanes Lane 1Lane 8Eac
5、h lane contains multiple tiles total 100 每个每个lane上有许多个上有许多个tiles共计共计100个(见上图)个(见上图)Each tile is imaged four times per cycle one image per base 每个循环会对每个每个循环会对每个tile照照4次相次相每个碱基都会成像每个碱基都会成像Image from 1 tileGenome AnalyzerAnalysis PipelineLibrary PreparationCluster StationGenomicmRNASmall RNAChIP-SeqIll
6、umina/GA Workflow 工作流程工作流程*Grow Clusters*5 hours*Or split process in stages*Safe stopping points*Start Sequencing*Cluster Density Evaluation*4 images per tile per cycle*Run time:2-3 days*Firecrest:Image Analysis*Bustard:Basecalling*Gerald:Sequence Alignment文库构建Cluster 工作站基因组测序分析*生成“簇”*5小时*开始测序*序列簇的丰
7、度检测*一个循环每个tile照4张照片*2-3天*图像分析*Bustard:basecalling*Gerald:序列分析?样品预处理 PCR成簇 测序 拼接分析5-6h 6-8d(纯化的基因组(纯化的基因组DNA)(基因组(基因组DNA小片段小片段小于小于800bp)(具有(具有5-磷酸末端的粘性片段)磷酸末端的粘性片段)(去除未连接上的(去除未连接上的接头)接头)(修饰末端)(修饰末端)()()(基因组(基因组DNA文库)文库)(纯化连接产物)(纯化连接产物)()()()()()()(基因组(基因组DNA小片段)小片段)27bp27bp27bp27bpGenomic DNAFragment
8、 libraryMate-paired libraryCreate library of DNA fragmentsDNA片段的文库构建片段的文库构建Cluster Generation序列簇的产生序列簇的产生Prepare DNA fragmentsLigate adaptersAttach single molecules to surfaceAmplify to form clusters Random array of clusters(Cluster的随机排列)100um1000 molecules per 1 um cluster 20.000 clusters per tile3
9、2-40 million clusters per experiment(1个cluster上有1000个分子 1个tile上有20,000个cluster 每个实验可完成3.2-4千万个cluster)通过cluster将单分子放大、固定OHOH Grafted flowcelldiol P7 P5 Cluster Generation:AmplificationdioldiolTemplate Hybridization(模板杂交)dioldiol Initial extensiondioldiol 1st cycle denaturation (No.1循环:变性)1st cycle a
10、nnealing(No.1循环:退火)dioldiol1st cycle extension(No.1循环:延伸)dioldioldioldiol2nd cycle denaturation2nd cycle annealingdioldioldiolCluster Generation:Amplificationdioldioldiol2nd cycle extensionCluster AmplificationOHdioldiolOHPeriodate Linearization(高碘酸盐,线性化)OHBlocking with ddNTP()(ddNTP阻断末端)Denature an
11、d HybridizationSBS3(变性、杂交)SBS边合成边测序OHSequencing By Synthesis边合成边测序边合成边测序1.Incorporation 结合2.Scan 扫描拍照3.Cleavage 清洗SBS CycleCAGTCATCACCTAGCGTA5GTCAGTCAGTCAGT35First base incorporated 第一个碱基合成上Cycle 1:Add sequencing reagents 加入合成所需反应物Detect Signal 检测荧光信号Cleave Terminator and Dye 去掉末端封闭,染色Cycle 2-n:Add
12、sequencing reagents and repeatSequencing By Synthesis(SBS)?Base Calling碱基识别碱基识别123789456T T T T T T T G T T G C T A C G A T The identity of each base of a cluster is read off from sequential imagesPaired EndPaired End 双末端测序、正反双向测序双末端测序、正反双向测序 Sample Preparation 样品前准备 Cluster Generation 分子簇的生成 Sequen
13、ce By Synthesis 边合成边测序(用于组装较大的(用于组装较大的Gene,一次最多读,一次最多读100bp)Sample Preparation5T3A5AT33A5T3TA5加双末端加双末端OHOHdiol P7 P5 Grafted FlowCells Single ReadPeriodate Linearization Paired EndUracil Specific Excision Reagent (USER)尿嘧啶特异性识别位点-P5formamidopyrimidine glycosylase(fpg)糖基化酶-P3 8oxoG-P7 U-P5 OHOHU8oxo-
14、G?Template hybridizationUUOHOH Grafted flowcellU P7 P5 Cluster Generation:Initial Extension Initial extensionUU 1st cycle DenaturationUU1st cycle annealingUU1st cycle extensionUU2nd cycle denaturationUU2nd cycle annealingUUUn=25totalCluster Generation:Amplification 成簇2nd cycle extensionUUUCluster Am
15、plificationUUP5 Linearization(USER)胞嘧啶识别位点处切割Block with ddNTPs末端ddNTPs封闭Denaturation and HybridizationSBS325个循环Sequencing 测序Denaturation and HybridizationSBS3Sequencing First Read(NO.1次次 读序)读序)Denaturation and De-Phosphorylation(PNK)变性和去磷酸作用OH OHResynthesis of P5 StrandOHP7 Linearization(fpg)OHBlock
16、 with ddNTPsDenaturation and HybridizationSBS8SequencingSecond Read(NO.2次次 读序)读序)Sequencing biochemistry Base/RunTime/runRead lengthDominant error type DNA cluster、Reversible terminators、Sequencing by Synthesis40-50 Gb10 days 2*75 bpSubstitutionIllumina Solexa形成形成DNA簇;簇;可逆阻断技术可逆阻断技术边合成边测序(边合成边测序(SBS
17、)40-50G/10天天/一个一个RUN 读长:读长:75bp (可双向)(可双向)错误类型:错误类型:替换替换Illumina solexaABI SOLiDRoche 454SOLID SOLID 测序技术测序技术AB/SOLID Workflow 工作流程工作流程1.1.文库制备文库制备2.2.Emulsion PCREmulsion PCR3.3.Beads EnrichmentBeads Enrichment4.4.微珠沉积微珠沉积5.5.连接测序连接测序6.6.数据分析数据分析1.1.文库制备文库制备2.Emulsion PCR3.Beads Enrichment4.微珠沉积5.连
18、接测序6.数据分析Work Flow:序列可以用超声波、机械剪切或酶解等方法,随机或者定向的打断成小片段序列可以用超声波、机械剪切或酶解等方法,随机或者定向的打断成小片段(大片段两头测序)(大片段两头测序)“Mate-paired”步骤:环化步骤:环化切割切割加接头加接头测序测序酶切为粘性末端对于复杂的分子环化,采用低浓度的模板浓度进行连接1.文库制备2.2.Emulsion PCREmulsion PCR3.微珠富集4.微珠沉积5.连接测序6.数据分析Work Flow:2.Emulsion PCR+Templates Enzyme+dNTPsP1-coupled DNA beads 100
19、,000 P1 sites per beadStart with 2 Billion beads per emulsionPolymerase100,000 P1 位点/每个bead2 Billion beads/每个emulsionMix PCR aqueous phase into a water-in-oilemulsion and carry out emulsion PCRReactor with template,bead and PCR reagentsMineral oil+surfactantsBeads collected following emulsion PCR:Be
20、ads with amplified product(40K PCR products per bead)Beads with no productP1P2P21.文库制备2.Emulsion PCR3.3.微珠富集微珠富集4.微珠沉积5.连接测序6.数据分析Work Flow:3.Enrichment/微珠富集微珠富集Centrifuge using a Glycerol Gradient甘油梯度离心甘油梯度离心Captured beads(+templates)in supernatantUncaptured beads(no template)in pellet1.文库制备2.Emuls
21、ion PCR3.微珠富集4.4.微珠沉积微珠沉积5.连接测序6.数据分析Work Flow:4.Deposite beads 3-end modificationBeads attached to glass surface in a random arrayTemplate bead deposition1.文库制备2.Emulsion PCR3.微珠富集4.微珠沉积5.5.连接测序连接测序6.数据分析Work Flow:ligase3 p5 universal seq primer Template Sequence 53 Adapter Oligo Sequence 1mbeadA-p
22、robe53C-probe53G-probe53T-probe5n n n n A z z zn n n n C z z zn n n n T z z zn n n n G z z z1mbead universal seq primer p55.SOLiD 4-color ligation reaction5.SOLiD 4-color ligation reactionTemplate Sequence 53 Adapter Oligo Sequence 1mbead1mbeadligaseligase3 p5 universal seq primer universal seq prim
23、er 53C-probe53G-probe53T-probe5A-proben n n n A z z zn n n n C z z zn n n n T z z zn n n n G z z zAp5A5Template Sequence 53 Adapter Oligo Sequence 1mbead universal seq primer A1mbead6.SOLiD 4-color ligation visualizationTemplate Sequence 53 Adapter Oligo Sequence 1mbead1mbeadC20T15G25A5T 107.SOLiD 4
24、-color ligation Resetligase8.SOLiD 4-color ligation (1st cycle after reset)ligase3 p5 universal seq primer n-1Template Sequence 53 Adapter Oligo Sequence 1mbead universal seq primer n-1 TA-probe53C-probe53G-probe53T-probe5n n n n A z z zn n n n C z z zn n n n T z z zn n n n G z z z1mbeadp5Consequenc
25、es of 2 Base Pair Encoding Detecting a single color does not indicate a base Each reading contains information from two basesTo decode the bases you must know one of the bases in the sequenceACGTACGT2nd Base1st BaseACGTACGT2nd Base1st BaseIf know first base is an A then immediately it decodes 2nd ba
26、se.This must be an A as Blue translates 2nd base A if first base AAACCGGTTACCAGTTGACCAGTTGAACCGGTTAACCGGTTAGCTGATCAGCTGATCAGCTGATCATCGGCTAExample:ABI SOLiDSequencing biochemistry Base/RunTime/runRead lengthDominant error type Emulsion PCRSequencing by ligation50 Gb10 days 2*50 bpSubstitutionEmulsion
27、 PCR边连接边测序(边连接边测序(SBL)50G/大于大于10天天/一个一个RUN 读长:读长:50bp (可双向)(可双向)错误类型:错误类型:替换替换Illumina solexaABI SOLiDRoche 454Genome Sequencer 20 Syste(2005)Genome Sequencer FLX Syste(2006)GS FLX Titanium(2008)发展历程:61emPCRemPCRSequencingSequencingDNA LibraryDNA Library Preparation Preparation DNA Library Preparati
28、onGenome fragmented by nebulizationAdaptor ligationsstDNA library created with adaptersA/B fragments selected using avidin-biotin purificationEmulsion PCR AmplificationAnneal sstDNA to an excess of DNA capture beadsEmulsify beads and PCR reagents in water-in-oil microreactorsClonal amplification occ
29、urs inside microreactorsSequencing By SynthesisLoad beads into PicoTiter Plate Sequencing by synthesis Photons Generated are Captured by CameraSequencing Image CreatedRoche/454 GS FLX Workflow sstDNAsstDNA librarylibrarygDNAgDNAGenome fragmented by nebulizationNo cloning;no colony picking sstDNA lib
30、rary created with adaptersA/B fragments selected using avidin-biotin purification1.DNA library preparation2.Emulsion Based Clonal AmplificationClonally-amplified Clonally-amplified sstDNAsstDNA attached to bead attached to beadsstDNAsstDNA library libraryAnneal sstDNA to an excess of DNA capture bea
31、dsEmulsify beads and PCR reagents in water-in-oil microreactorsBreak microreactors,enrich for DNA-positive beadsClonal amplification occurs inside microreactors3.Loading DNA Beads into the PicoTiterPlatedNTP PPiPPi+APS ATPATP+Luciferin luciferase Oxyluciferin+Light 4.SequencingThe sequencing instrum
32、ent consists of the following major subsystems:(a)a fluidic assembly,(b)a flow chamber that includes the well-containing fibre-optic slide,(c)a CCD camera-based imaging assembly,and a computer that provides the necessary user interface and instrument control.Roche 454Sequencing biochemistry Base/Run
33、Time/runRead lengthDominant error type Emulsion PCRPolymerase pyrosequencing0.4-0.6 Gb10 hrs 400bpInsertion&Deletion微乳液微乳液 PCR聚合测序聚合测序0.4-0.6G/10h/一个一个RUN 读长:读长:400bpMax:800bp错误类型:错误类型:插入插入&缺失缺失第二代测序技术小结 454测序仪:测序仪:454测序仪使用的方法,经微乳液PCR发扩增后,携带有大量模板分子的微珠被放置到芯片上的微孔中。随后使用焦磷酸法测序,每一轮测序反应都会掺入一个核苷酸,随后加入反应试剂荧
34、光素和腺苷酰硫酸。这样在每一个小孔中每当有聚合酶将核苷酸掺入到模板上都会发光。最后用腺苷三磷酸双磷酸酶洗涤去掉多余的核苷酸。(对重复序列如poly A的测定不准确,因荧光信号具有累加效果)Solexa测序仪:测序仪:Solexa测序仪使用桥式PCR直接在芯片进行模板扩增,然后同时加入四种经过修饰的脱氧核苷酸,每一个核苷酸都携带一种荧光集团和一个可被去除的终止基团。经过修饰的DNA聚合酶催化引物延伸测序反应。采集图像、然后切除荧光标记基团和终止基团,重复上述反应,完成测序。(边合成边测序)Solid测序仪:测序仪:Solid测序仪使用微乳液PCR法扩增模板片段,然后吸附有大量扩增片段的直径1um
35、的磁珠倍制成高密度测序芯片,借助使用连接酶而不是聚合酶测序法完成测序。在solid测序一中,每一次反应都会在引物末端加上一个荧光标记的8bp的探针,在探针中央的两个碱基上标记有荧光基团,探针被连接上之后发出荧光,随后荧光基团部分被切除,重新系下一轮反应。(边连接边测序)第一代测序技术 versus 第二代测序技术Current popular sequencing platformCompanyFormatRead Length(bases)读长读长(bp)Expected Throughput(Gb/Run)测序通量测序通量Applied BiosystemsCapillary electr
36、ophoresis10003-4MSolexaParallel microchip75+7550SolidSequencing by ligation50+5050454 Life SciencesParallel bead array200(400)+200(400)0.4第三代测序技术HeliscopeHeliscope单分子测序仪单分子测序仪-Helicos Biosciences测序原理测序原理:边合成边测序特点特点:无需对待测模板进行扩增,采用高灵敏度的荧光探测仪,直接对单链的DNA模板进行合成测序,序列读长为24到70个碱基,平均读长为32个碱基。流程流程:(1)待测文库片段化;(
37、2)3端加poly A尾,并与固定在芯片上的poly T进行杂交,将待测模板固定到芯片上,制成测序芯片。(3)通过DNA聚合酶将荧光标记的单核苷酸渗入到引物上,每一轮反应加入一种dNTP。(4)采集荧光信号,切除荧光标记集团,进行下一轮测序反应。斯坦福大学的科学家最近利用Helicos Biosciences的Heliscope单分子测序仪,对一名白人男子的基因组进行了测序,文章发表在最新一期的Nature Biotechnology在线版上。利用一台Heliscope测序仪和4次数据收集运行,完成了此次测序。研究人员报告称,他们产生了数十亿个Heliscope序列读取,覆盖了90%的人参考基
38、因组,覆盖度达28倍。序列读长为24到70个碱基,平均读长为32个碱基。到目前为止,他们已经鉴定出280万个SNP和752个拷贝数变异。测序花了4个星期的时间,试剂花费为48000美元 Single molecule real Single molecule real time(SMRTtime(SMRT)DNA)DNA测测序技术序技术Pacific Biosciences(1)以SMRT芯片载体:带有3000个直径为70nm左右的纳米级小孔的金属片,将DNA聚合酶、待测序列和不同荧光标记的dNTP放入到ZMW孔中,进行合成反应。(2)SMRT技术的测序速度很快,可达到每秒大约10个dNTP。
39、、它实现了DNA聚合酶内在自身的反应速度,一秒可以测10个碱基,测序速度是化学法测序的2万倍。(3)它实现了DNA聚合酶内在自身的processivity(延续性,也就是DNA聚合酶一次可以合成很长的片段),一个反应就可以测非常长的序列。二代测序现在可以测到上百个碱基,但是三代测序现在就可以测几千个碱基。这为基因组的重复序列的拼接提供了非常好的条件。(4)它的精度非常高,达到99.9999%。纳米孔单分子技术纳米孔单分子技术-Oxford Nanopore公司测序原理:测序原理:不同碱基产生的电信号进行测序。步骤:步骤:特殊材料制成的纳米孔,孔内共价结合有分子接头环糊精,核酸外切酶切割单链DNA时,被切割下来的碱基落入纳米孔,并与环糊精相互作用,短暂影响流过纳米孔的电流强度,电流强度的变化幅度成为每种碱基的检测特征。碱基在纳米孔中的平均停留时间是毫秒级的,一定强度的电压可保证在电信号记录后将碱基从纳米孔中清除。独特特点:独特特点:直接读取甲基化的胞嘧啶。