1、Pipelining-1OutlinevAn overview of pipeliningvA pipelined datapathvPipelined controlvData hazards and forwardingvData hazards and stallsvBranch hazardsvExceptionsvSuperscalar and dynamic pipeliningPipelining-2vLaundry example:Ann,Brian,Cathy,Dave each have one load ofclothes to wash,dry,and foldWash
2、er takes 30 minutesDryer takes 40 minutes“Folder”takes 20 minutesABCDPipelining Is Natural!Pipelining-3vSequential laundry takes 6 hours for 4 loadsvIf they learned pipelining,how long would it take?ABCD3040203040203040203040206 PM7891011MidnightTaskOrderTimeSequential LaundryPipelining-4vPipelined
3、laundry takes 3.5 hours for 4 loads ABCD6 PM7891011MidnightTaskOrderTime304040404020Pipelined Laundry:Start ASAPPipelining-5Pipelining LessonsvDoesnt help latency of single task,but throughput of entirevPipeline rate limited by slowest stagevMultiple tasks working at same time using different resour
4、cesvPotential speedup=Number pipe stagesvUnbalanced stage length;time to“fill”&“drain”the pipeline reduce speedupvStall for dependencesABCD6 PM789TaskOrderTime304040404020Pipelining-6Single cycle vs.PipelineClkCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 Cycle 10LoadPipeline Imple
5、mentation:ClkSingle Cycle Implementation:LoadStoreWasteIfetchRegExecMemWrIfetchRegExecMemWrStoreIfetchRegExecMemWrR-typeCycle 1Cycle 2Pipelining-7Pipeline PerformanceSingle-cycle(Tc=800ps)Pipelined(Tc=200ps)Pipelining-8Instr.OrderTime(clock cycles)Inst 0Inst 1Inst 2Inst 4Inst 3ALUImRegDmRegALUImRegD
6、mRegALUImRegDmRegALUImRegDmRegALUImRegDmRegWhy Pipeline?Because the Resources Are There!Single-cycle DatapathPipelining-9OutlinevAn overview of pipeliningvA pipelined datapathvPipelined controlvData hazards and forwardingvData hazards and stallsvBranch hazardsvExceptionsvSuperscalar and dynamic pipe
7、liningPipelining-10Designing a Pipelined ProcessorvExamine the datapath and control diagramStarting with single cycle datapathSingle cycle control?vPartition datapath into stages:IF(instruction fetch),ID(instruction decode and register file read),EX(execution or address calculation),MEM(data memory
8、access),WB(write back)vAssociate resources with stagesvEnsure that flows do not conflict,or figure out how to resolvevAssert control in appropriate stagePipelining-11Multi-Execution StepsStep nameAction for R-type instructionsAction for memory-reference instructionsAction for branchesAction for jump
9、sInstruction fetchIR=MemoryPCPC=PC+4InstructionA=Reg IR25-21decode/register fetchB=Reg IR20-16ALUOut=PC+(sign-extend(IR15-0)2)Execution,addressALUOut=A op BALUOut=A+sign-extendif(A=B)thenPC=PC 31-28 IIcomputation,branch/(IR15-0)PC=ALUOut(IR25-02)jump completionMemory access or R-typeReg IR15-11=Load
10、:MDR=MemoryALUOutcompletionALUOutorStore:Memory ALUOut=BMemory read completionLoad:RegIR20-16=MDRBut,use single-cycle datapath.Pipelining-12Split Single-cycle DatapathWhat to add to split the datapath into stages?InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionMux01AddPC0WritedataMux1R
11、egistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddataAddressDatamemory1ALUresultMuxALUZeroIF:Instruction fetchID:Instruction decode/register file readEX:Execute/address calculationMEM:Memory accessWB:Write backFeedbackPathPipelining-13InstructionmemoryAdd
12、ress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXDatamemoryAddressPipeline registers(latches)Add Pipeline RegistersvUse registers between stag
13、es to carry data and controlPipelining-14vIF:Instruction FetchFetch the instruction from the Instruction MemoryvID:Instruction DecodeRegisters fetch and instruction decodevEX:Calculate the memory addressvMEM:Read the data from the Data MemoryvWB:Write the data back to the register fileCycle 1Cycle 2
14、Cycle 3Cycle 4Cycle 5IfetchReg/DecExecMemWrLoadConsider loadPipelining-15ClockCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7IfetchReg/DecExecMemWr1st lwIfetchReg/DecExecMemWr2nd lwIfetchReg/DecExecMemWr3rd lwPipelining loadv5 functional units in the pipeline datapath are:Instruction Memory for th
15、e Ifetch stageRegister Files Read ports(busA and busB)for the Reg/Dec stageALU for the Exec stageData Memory for the MEM stageRegister Files Write port(busW)for the WB stagePipelining-16vIR=memPC;PC=PC+4IF Stage of loadInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMu
16、x01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction fetchlwAddressDatamemoryInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersReaddata 1Re
17、addata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXMEM/WBInstruction decodelwAddressDatamemoryIR,PC+4Pipelining-17ID Stage of loadvA=RegIR25-21;B=RegIR20-16;InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0W
18、ritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction fetchlwAddressDatamemoryInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Re
19、adregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXMEM/WBInstruction decodelwAddressDatamemoryPipelining-18EX Stage of loadvALUout=A+sign-ext(IR15-0)InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersRea
20、ddata 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXMEM/WBExecutionlwAddressDatamemoryPipelining-19MEM State of loadvMDR=memALUoutInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersRead
21、data 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddataDatamemory1ALUresultMuxALUZeroID/EXMEM/WBMemorylwAddressInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216Si
22、gnextendWritedataReaddataDatamemory1ALUresultMuxALUZeroID/EXMEM/WBWrite backlwWriteregisterAddress97108/Patterson Figure 06.15Pipelining-20WB Stage of loadvRegIR20-16=MDRInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Read
23、register 1Readregister 216SignextendWriteregisterWritedataReaddataDatamemory1ALUresultMuxALUZeroID/EXMEM/WBMemorylwAddressInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendWritedataRea
24、ddataDatamemory1ALUresultMuxALUZeroID/EXMEM/WBWrite backlwWriteregisterAddress97108/Patterson Figure 06.15Who will supply this address?Pipelining-21Cycle 1Cycle 2Cycle 3Cycle 4IfetchReg/DecExecWrR-typeThe Four Stages of R-typevIF:fetch the instruction from the Instruction MemoryvID:registers fetch a
25、nd instruction decodevEX:ALU operates on the two register operandsvWB:write ALU output back to the register filePipelining-22vWe have a structural hazard:Two instructions try to write to the register file at the same time!Only one write portClockCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle
26、 8Cycle 9IfetchReg/DecExecWrR-typeIfetchReg/DecExecWrR-typeIfetchReg/DecExecMemWrLoadIfetchReg/DecExecWrR-typeIfetchReg/DecExecWrR-typeOps!We have a problem!Pipelining R-type and loadPipelining-23Important ObservationIfetchReg/DecExecMemWrLoad12345IfetchReg/DecExecWrR-type1234vEach functional unit c
27、an only be used once per instructionvEach functional unit must be used at the same stage for all instructions:Load uses Register Files write port during its 5th stageR-type uses Register Files write port during its 4th stageSeveral ways to solve:forwarding,adding pipeline bubble,making instructions
28、same lengthPipelining-24ClockCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9IfetchReg/DecMemWrR-typeIfetchReg/DecMemWrR-typeIfetchReg/DecExecMemWrLoadIfetchReg/DecMemWrR-typeIfetchReg/DecMemWrR-typeIfetchReg/DecExecWrR-typeMemExecExecExecExec12345Solution:Delay R-types WritevDelay R-
29、types register write by one cycle:R-type also use Reg Files write port at Stage 5MEM is a NOP stage:nothing is being done.R-type also has 5 stagesPipelining-25Cycle 1Cycle 2Cycle 3Cycle 4IfetchReg/DecExecMemStoreWrThe Four Stages of storevIF:fetch the instruction from the Instruction MemoryvID:regis
30、ters fetch and instruction decodevEX:calculate the memory addressvMEM:write the data into the Data MemoryAdd an extra stage:vWB:NOPPipelining-26vIF:fetch the instruction from the Instruction MemoryvID:registers fetch and instruction decodevEX:compares the two register operandselect correct branch ta
31、rget addresslatch into PCAdd two extra stages:vMEM:NOPvWB:NOPCycle 1Cycle 2Cycle 3Cycle 4IfetchReg/DecExecMemBeqWrThe Three Stages of beqPipelining-27Pipelined DatapathInstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0AddressWritedataMux1RegistersReaddata 1Rea
32、ddata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddataDatamemory1ALUresultMuxALUZeroID/EXPipelining-28Graphically Representing PipelinesvCan help with answering questions like:How many cycles to execute this code?What is the ALU doing during cycle 4?Help understand datapathsIM
33、RegDMRegIMRegDMRegCC 1CC 2CC 3CC 4CC 5CC 6Time(in clock cycles)lw$10,20($1)Programexecutionorder(in instructions)sub$11,$2,$3ALUALUPipelining-29Example 1:Cycle 1InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readreg
34、ister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction decodelw$10,20($1)Instruction fetchsub$11,$2,$3InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readreg
35、ister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction fetchlw$10,20($1)AddressDatamemoryAddressDatamemoryClock 1Clock 2Pipelining-30Example 1:Cycle 2InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddat
36、a 1Readdata 2Readregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction decodelw$10,20($1)Instruction fetchsub$11,$2,$3InstructionmemoryAddress4320AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2R
37、eadregister 1Readregister 216SignextendWriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXInstruction fetchlw$10,20($1)AddressDatamemoryAddressDatamemoryClock 1Clock 2Pipelining-31InstructionmemoryAddress40AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReadda
38、ta 1Readdata 2Readregister 1Readregister 23216SignextendWriteregisterWritedataMemorylw$10,20($1)Readdata1ALUresultMuxALUZeroID/EXExecutionsub$11,$2,$3InstructionmemoryAddress40AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readre
39、gister 2WriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXExecutionlw$10,20($1)Instruction decodesub$11,$2,$33216SignextendAddressDatamemoryDatamemoryAddressClock 3Clock 4Example 1:Cycle 3Pipelining-32InstructionmemoryAddress40AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0Writed
40、ataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 23216SignextendWriteregisterWritedataMemorylw$10,20($1)Readdata1ALUresultMuxALUZeroID/EXExecutionsub$11,$2,$3InstructionmemoryAddress40AddAddresultShiftleft 2InstructionIF/IDEX/MEMMEM/WBMux01AddPC0WritedataMux1RegistersReaddata 1Readdata
41、 2Readregister 1Readregister 2WriteregisterWritedataReaddata1ALUresultMuxALUZeroID/EXExecutionlw$10,20($1)Instruction decodesub$11,$2,$33216SignextendAddressDatamemoryDatamemoryAddressClock 3Clock 4Example 1:Cycle 4Pipelining-33InstructionmemoryAddress4320AddAddresult1ALUresultZeroShiftleft 2Instruc
42、tionIF/IDEX/MEMID/EXMEM/WBWrite backMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendMuxALUReaddataWriteregisterWritedatalw$10,20($1)InstructionmemoryAddress4320AddAddresult1ALUresultZeroShiftleft 2InstructionIF/IDEX/MEMID/EXMEM/WBWrite backMux01AddPC0Writ
43、edataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendMuxALUReaddataWriteregisterWritedatasub$11,$2,$3Memorysub$11,$2,$3AddressDatamemoryAddressDatamemoryClock 6Clock 5Example 1:Cycle 5Pipelining-34InstructionmemoryAddress4320AddAddresult1ALUresultZeroShiftleft 2InstructionIF
44、/IDEX/MEMID/EXMEM/WBWrite backMux01AddPC0WritedataMux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendMuxALUReaddataWriteregisterWritedatalw$10,20($1)InstructionmemoryAddress4320AddAddresult1ALUresultZeroShiftleft 2InstructionIF/IDEX/MEMID/EXMEM/WBWrite backMux01AddPC0WritedataM
45、ux1RegistersReaddata 1Readdata 2Readregister 1Readregister 216SignextendMuxALUReaddataWriteregisterWritedatasub$11,$2,$3Memorysub$11,$2,$3AddressDatamemoryAddressDatamemoryClock 6Clock 5Example 1:Cycle 6Pipelining-35OutlinevAn overview of pipeliningvA pipelined datapathvPipelined controlvData hazard
46、s and forwardingvData hazards and stallsvBranch hazardsvExceptionsvSuperscalar and dynamic pipeliningPipelining-36Pipeline Control:Control SignalsPCInstructionmemoryAddressInstructionInstruction20 16MemtoRegALUOpBranchRegDstALUSrc41632Instruction15 000RegistersWriteregisterWritedataReaddata 1Readdat
47、a 2Readregister 1Readregister 2SignextendMux1WritedataReaddataMux1ALUcontrolRegWriteMemReadInstruction15 116IF/IDID/EXEX/MEMMEM/WBMemWriteAddressDatamemoryPCSrcZeroAddAddresultShiftleft 2ALUresultALUZeroAdd01Mux01MuxPipelining-37Execution/Address Calculationstage control linesMemory access stagecont
48、rol linesWrite-back stagecontrol linesRegDstALUOp1ALUOp0ALUSrcBranchMemReadMemWriteRegwriteMem toReg110000010000101011X0010010XX0101000XFig.4.22Group Signals According to StagesvCan use control signals of single-cycle CPU Pipelining-38vPass control signals along just like the dataMain control genera
49、tes control signals during ID Data Stationary ControlControlEXMWBMWBWBIF/IDID/EXEX/MEMMEM/WBInstructionFig.4.50Pipelining-39IF/ID RegisterID/Ex RegisterEx/MEM RegisterMEM/WB RegisterIDEXMEMExtOpALUOpRegDstALUSrcBranchMemWrMemtoRegRegWrMainControlExtOpALUOpRegDstALUSrcMemtoRegRegWrMemtoRegRegWrMemtoR
50、egRegWrBranchMemWrBranchMemWWBData Stationary Control(cont.)vSignals for EX(ExtOp,ALUSrc,.)are used 1 cycle latervSignals for MEM(MemWr,Branch)are used 2 cycles latervSignals for WB(MemtoReg,MemWr)are used 3 cycles laterPipelining-40WB Stage of loadvRegIR20-16=MDRInstructionmemoryAddress4320AddAddre