1、2nd lecture1.Gene diagnosis,paternity test and forensic science.2.Human genome project.Molecular markersn1 RFLPn2 VNTRn3 RAPDn4 AFLPn5 Microsatelliten6 SNPWhat is an ideal molecular markernHighly polymorphicnSemidominantnEasily distinguishable allelesnWidely and uniformly distributed in the genomenN
2、o pleiotropynSimple and rapid detectionnLow costnHighly reproducibleEarly molecular markersnABO bloodtypesnHLA locinIsozymeRFLP(Restriction fragment length polymorphism)nBased on the variations of Restriction sitesnSemidominantnInvolves Southern hybridizationRFLPnRFLPs are based on the analysis of p
3、atterns derived from a DNA sequence cleaved by using known restriction enzymes.Differences are noticed when the length of fragments are not the same,telling us that the restriction enzyme cut the DNA at two unrelated locations.These similarities and differences can be used to differentiate species,r
4、aces,and strains from one another.Restriction enzyme EcoRInRecognizes palindromic sequencen GAATTCn CTTAAGnCLEAVAGE FREQUENCY:(1/4)6Recognition frequency and cleavage frequencyn GAPuPyTCnRF=CF=(1/4)5nGAATTCNNNNNNnRF=(1/4)6nCF=2 X(1/4)6MismatchnMatch:1/4nMismatch:3/4RFLPRFLPCan we construct a 4kb pla
5、smid genomic library with a 6 cutter?nExact binomial probabilitiesRFLPn Disadvantages:1.Very long methodology before results are gained 2.High labour requirements 3.High quality,and large quantities of DNA must be used 4.Must frequently work with radioisotopes 5.Many probes are not available dependi
6、ng on species 6.Too many polymorphisms may be present for a short probe 7.Cost of development is very high due to time,and labour requirements 8.Low frequency of desired polymorphisms in polyploid plants(eg.wheat)VNTR(variable number of tandem repeats)nRestriction fragment length variabilitynAlso kn
7、own as minisatellitesnCan have hundreds of alleles per locusnBased on Southern hybridizationpaternity index,PI nPI=0.5/gene frequencyn假设父提供生父基因成为孩子生父的可能性和随机男人提供生父基因成为孩子生父的可能性的比值叫作亲权指数。MinisatellitenThe genome contains repetitive sequences spanning 500 to 20,000 base pairs(a repeat unit is 5-30 base
8、pairs);these sequences are called minisatellite or variable number of tandem repeats(VNTR).They are mainly located near chromosome ends.In some minisatellite sequences,frequent changes of the repeat number(up to 10%)are observed in germ cells.Such alterations may occur a hundred to a thousand times
9、more frequently than mutations in a DNA region coding for a protein.RAPD(random amplified polymorphic DNA)RAPD(random amplified polymorphic DNA)nAmplification of DNA fragments using one random primer.nNo prior knowledge of sequences nMultiple fragments are amplified nNo Southern hybridization or aut
10、oradiography is requirednDominantnCost effectivenLow reproducibilityRAPD The complexity of eukaryotic nuclear DNA is sufficiently high that by chance pairs of sites complementary to single octa-or decanucleotides may exist in the correct orientation and close enough to one another for PCR amplificat
11、ion.With some randomly chosen decanucleotides no sequences are amplified.With others,the same length products are generated from DNAs of different individuals.With still others,patterns of bands(such as those illustrated)are not the same for every individual in a population.The variable bands are co
12、mmonly called random amplified polymorphic DNA(RAPD)bands.RAPDnIf one mismatch is allowed in a 10 bp primer sequence except the 3 bp at the 3 end,then the annealing frequency would be:n(1/4)10+7 X 3/4 X(1/4)9A bipartite amplificationnAn amplification with a bipartite primer may increase specificity
13、through raising annealing temperature after several initial cycles.AFLP(amplified fragment length polymorphism)AFLP(amplified fragment length polymorphism)nSelective PCR amplification of restriction fragmentsnAdaptor is requirednNumber of PCR products can be adjusted by the number of random nucleoti
14、des at the 3 termini of the primers nDominant or semidominantnCompared to RAPD,fewer primers should be needed to screen all possible sites.The AFLP procedure typically detects more polymorphisms per reaction than RFLP or RAPD analysis.AFLPn GAATTCAATTAFLPnUsing EcoR I and Mse I,a combination of rare
15、 cutter and frequent cutter.nScarcity of Mse I only ampilicons 1 AT rich Mse I priming site inefficiently amplified at high annealing temperature 2 stem-loop structure formed by the homologous ends suppresses PCRAFLPn GTC 1/64 G nGAATTC A 1/16 CT TTAAn T Cn C An AFLPnIn the first step of AFLP analys
16、is,genomic DNA is digested with both a restriction enzyme that cuts frequently(MseI,4 bp recognition sequence)and one that cuts less frequently(EcoRI,6 bp recognition sequence).nThe resulting fragments are ligated to end-specific adaptor molecules.AFLPnA preselective PCR amplification is done using
17、primers complementary to each of the two adaptor sequences,except for the presence of one additional base at the 3 end.Which base is chosen by the user.Amplification of only 1/16th of EcoRI-MseI fragments occurs.AFLPnIn a second,selective,PCR,using the products of the first as template,primers conta
18、ining two further additional bases,chosen by the user,are used.The EcoRI-adaptor specific primer used bears a label(fluorescent or radioactive).nGel electrophoretic analysis reveals a pattern(fingerprint)of fragments representing about 1/4000 th of the EcoRI-MseI fragments.Microsatellites(simple tan
19、dem repeats)n2-6 bases/unitnUsing flanking unique sequences as primer for PCR amplificationnSemidominantnUsed for whole genome scanningnPaternity test and forensic scienceSNP(single nucleotide polymorphism)n1 SNP/1250bpnSemidominantnExpected to be applied in linkage disequilibrium studies.nhaplotype
20、nHuman genome projectnAnimal model and functional genomicsnProteomicsnDrug leadsnPublic funded human genome projectnPrivately funded human genome projectPublic funded human genome project(International Human Genome Sequencing Consortium)nGenetic mapnPhysical mapnSequencingnEST(expressed sequence tag
21、s)Genetic mapnPredominantly microsatelltesnCEPH,FrancenGenethon,FrancePhysical mapnUsing STS(sequence tagged sites)linkage(STR)hybridization with YACPhysical mapnRestriction mappingn1 complete restrictionn2 partial restrictionPhysical mapnRadiation hybrid human chromosomes are fragmented by radiatio
22、n,and fused with Chinese Hamster cells.The coexistence of these markers in the hybrid cells are an indication of genetic distance between markersVarious vectorsnYACnBACnPACnCosmidYACnade2(ochre mutation)strain is red but does not grow in minimal medium.nsup4 on YAC vector suppresses the ochre mutati
23、on on ade2,so empty vector transformant is white.nDNA inserts would interrupt sup4 and transformants would be red.Separation of YAC fragmentsnRescue of plasmid endnLigation-mediated PCRnVector-Alu PCRnInverse PCRSequencingnLocal shotgun-clone by clone or Bac by BacEST(expressed sequence tags)nSequen
24、ce both ends of cDNAnnormalized cDNA libraryPrivately funded human genome project led by J.Craig VenterCelera GenomicsnWhole-genome assemblynCompartmentalized shotgun assemblyShot-gunnFragmentation,cloning and sequencingnTo sequence the genome from a mammal(all mammalian genomes are about 3,000,000,
25、000 base-pairs)means that up to 30,000,000,000 base-pairs must be sequenced.This corresponds to about 60,000,000 individual DNA sequence reads.To fit so many fragments together requires a vast amount of computing power and highly sophisticated software.nExperience tells us that whole-genome shotgun
26、sequencing leaves gaps.In April 2000,a consortium of publicly funded researchers and Celera Genomics published the sequence of much of the genome of the fruit fly Drosophila melanogaster.In about 120,000,000 base-pairs of sequence there were about 1600 gaps.This figure will improve with a finishing
27、stage.nThe advantage of the whole-genome shotgun is that it requires no prior mapping.Disadvantages include the large IT resources required and the fact that,unlike the clone-by-clone approach,no large assemblies of contigs are produced until the end of the project.Different libraries for sequencing
28、nBac library:50kb insertsnPlasmid library:10kb insertsnPlasmid library:2kb insertsnMate ends:the 2 ends of an insert5 fold coverage sequencingnPoisson Distribution nPoisson DistributionWhole-genome assemblynScreenernOverlapper nUnitiggernScaffoldernRepeat ResolverCompartmentalized shotgun assemblync
29、lustering Celera reads and bactigs into large,multiple megabase regions of the genome,nrunning the WGA assembler on the Celera data and shredded,faux reads obtained from the bactig data.Gene annotationConclusions from the genome projectsnThe low gene number in humans can be compensated by the combin
30、atorial diversity generated at the level of 1 RNA splicing 2 translational control 3 posttranslational modification 4 shuffling of different domainsn SNPConclusions from the genome projectsnThere appear to be about 30,00040,000 protein-coding genes in the human genome,only about twice as many as in
31、worm or fly.However,the genes are more complex,with more alternative splicing generating a larger number of protein products.Conclusions from the genome projectsnThe full set of proteins(the proteome)encoded by the human genome is more complex than those of invertebrates.This is due in part to the p
32、resence of vertebrate-specific protein domains and motifs(an estimated 7%of the total),but more to the fact that vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.Conclusions from the genome projectsnHundreds of human genes appear likely to
33、have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage.Dozens of genes appear to have been derived from transposable elements.Conclusions from the genome projectsn Although about half of the human genome derives from transposable elements,there has been a marked
34、 decline in the overall activity of such elements in the hominid lineage.DNA transposons appear to have become completely inactive and long-terminal repeat(LTR)retroposons may also have done so.Conclusions from the genome projectsnThe pericentromeric and subtelomeric regions of chromosomes are fille
35、d with large recent segmental duplications of sequence from elsewhere in the genome.Conclusions from the genome projectsnThe mutation rate is about twice as high in male as in female meiosis,showing that most mutation occurs in males.Conclusions from the genome projectsnRecombination rates tend to b
36、e much higher in distal regions(around 20 megabases(Mb)of chromosomes and on shorter chromosome arms in general,in a pattern that promotes the occurrence of at least one crossover per chromosome arm in each meiosisScreenernScreener is used to remove known repetitive elements such as SINE(Alu),Line,e
37、tc.OverlappernAt least 40 bp sequence identity is used for overlapping.UnitiggerUnitiggernA substancial higher level of sequence coverage suggests the existance of repetitive sequence.nThreshholds are defined to remove sequencing reads of high coverage at the Unitigger stage.ScaffoldernMate ends Repeat resolverBlast(Basic local alignment search tool)nNucleotide-nucleotide BLAST(blastn)nTranslated query vs.protein database(blastx)nProtein query vs.translated database(tblastn)nTranslated query vs.translated database(tblastx)