1、IC Technology What What Will Will thethe Next Next Node Node OfferOffer Us?Us?MOORESMOORES LAWLAWTransistors per microprocessor1010109 108 107 106 105 104 197119801990200020102017Source:Karl Rupp.40 Years of Microprocessor Trend Data.2MOORES LAWDENSITY DENSITY AND COST AND COST PER FUNCTIONSource:G.
2、Moore,Electronics,1965105104103102101051110102103104Number of Components per Integrated CircuitRelative Manufacturing Cost/Component19601965197010410310210110010-110-210-3MOORES MOORES LAW LAW IS WELL AND ALIVEDENSITYDENSITY:A NECESSARY ATTRIBUTE19701975198019851990199520002005201020152020Relative D
3、ensityYearStandard cell inverterHigh density SRAMLogic gatesTransistor density(microprocessors)4IMAGINE:TRANSISTOR PERFORMANCE TRANSISTOR PERFORMANCE W/OW/O DENSITYDENSITY5 Not enough memory No multi-core chips No accelerators Wire delay slows big chips.6IMAGINE:TRANSISTOR PERFORMANCE TRANSISTOR PER
4、FORMANCE W/OW/O DENSITYDENSITYTECHNOLOGY LEADERSHIPLEADERSHIPN77Worlds first 7 nmParticipated in all the products on 7 nmBest performance Highest density Extensive EUV layers Design ecosystem ready In risk productionTECHNOLOGY LEADERSHIPLEADERSHIPN7N5(P)8N3TECHNOLOGY LEADERSHIPLEADERSHIPN7N5(P)910TH
5、E ELEPHANTELEPHANTIN THE ROOM10-1 mBacteria2 mStrand of hair0.1 mmTennis ball10 cmVirus50 nmCarbon nanotube1.2 nmFinFETWater molecule0.28 nmHHO-+Hydrogen atom0.1 nm10-2 m10-3 m10-4 m10-5 m10-6 m10-7 m10-8 m10-9 m10-10 mCONTINUOUSBENEFITSNODE AFTER NODEMOORES LAW MOORES LAW A HISTORY OF INNOVATIONSDe
6、nnard scalingStrained Si,high-k/metal gateFinFET/DTCO11CONTINUOUSBENEFITSNODE AFTER NODEMULTIPLE ROADS MULTIPLE ROADS LEAD TO ROMEInnovations12INTEGRATING INTEGRATING CHIPS INTO SYSTEMSIt may prove to be more economical to build large systems out of smaller functions,which are separately packaged an
7、d interconnected.The availability of large functions,combined with functional design and construction,should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically.13Source:G.Moore,Electronics,1965CoWoSCoWoS SYSTEM INTEGRATIO
8、NSource:2013 TSMC Technology Symposium14TSMC CoWoS fullyassembled test chip1 SoC+2 DRAMsCoWoSCoWoS SYSTEM INTEGRATION2500 mm2interposer:2 processors(600 mm2)+8 HBM DRAM15Integrated Si/Package Area,ReticleSYSTEM INTEGRATIONSYSTEM INTEGRATION TECHNOLOGIESI/O Pin Count16Package SizeInterposer Size(mm2)
9、GP100(Courtesy of Nvidia)7V580THeterogeneous Integration (Courtesy of Xilinx)7V2000THomogeneous Integration (Courtesy of Xilinx)XCVU440(Courtesy of Xilinx)GV100(Courtesy of Nvidia)mm2CHIPLETS INTEGRATIONREDUCES REDUCES SYSTEM COST PER FUNCTION2X1X1.5X17PC/InternetMobileAI/5GMini-ComputerTransistor R
10、adioSEMICONDUCTOR TECHNOLOGYSEMICONDUCTOR TECHNOLOGY EVOLVESEVOLVESDRIVEN BY CHANGING APPLICATION LANDSCAPEInvention of point-contact transistor1947Transistor ScalingPrinciple1974Intel 40041971Invention of IC1958Pentium CPU1995Flash Memory1984Mobile phone19734G20093GiPhone2002 2007FinFET1999GPU(21B
11、Transistors)20175nm CMOS20207nm FinFET20182050 and beyond1815%85%8%92%20%80%MemoryComputeDeep Learning AcceleratorsIntel performance counter monitors 2 CPUs,8-cores/CPU+128GB DRAMDATA MOVEMENT DATA MOVEMENT HITS THE MEMORY WALLABUNDANT-DATA APPLICATIONS:ENERGY MEASUREMENTSSource:S.Mitra(Stanford)19R
12、esNet-152 (CNN)AlexNet (CNN)Language Model (LSTM)Network(application)Type(LSTM/CNN)Training/InferenceModel SizeMemory Usage(GBytes)ResNet(vision)CNNTraining120 MBytes21*Inference0.12Language Model (NLP)LSTMTraining2.5 GBytes40*Inference2.5*Training memory usage:Batch size 64,word size 64-bit,memory
13、can increase with greater batch sizes,footprint of activations,weights,errors and gradients.Source:M.Lee,W.Hwang,Prof.S.Mitra(Stanford),M.Aly(NTU,Singapore),Y.Wang,K.Akarvardar(TSMC)DEEP DEEP NEURALNEURAL NETWORKSNETWORKSREQUIRE LARGE MEMORY CAPACITY20ON-CHIP ON-CHIP SRAMSRAM CAPACITY:CAPACITY:NEVER
14、 ENOUGH0102030405060Estimated On-chip SRAM(MB)200620182012Launch Year20092015Intel Xeon X5355NVIDIA Tesla K40NVIDIA Tesla V100Intel Xeon E7-8890 v4CPUGPU3.8 Gbytes1.4 nm nodeSource:W.Hwang,Prof.S.Mitra(Stanford)21CAN WE PUT LOTS OFMEMORY ON-CHIP?WHAT KINDS OF MEMORY,FOR WHICH APPLICATION?22Source:“I
15、nside Volta”,Nvidia GPU Tech.Conf.,May 10,2017.Heterogeneous Integration:GPU+High Bandwidth Memory(HBM2)CoWoS ModuleSuperior processing power that equals to 100 CPUs300 B transistorsSUPER AI ACCELERATORENABLED ENABLED BYBY CoWoSCoWoS HBM2HBM2HBM2HBM2GPU23COMPUTE-MEMORYCOMPUTE-MEMORY INTEGRATIONINTEG
16、RATIONPrinted Circuit BoardSi Logic DieOff-Chip DRAMLimited I/O Connectivity2D System(traditional baseline)24Source:W.Hwang,W.Wan,Y.Malviya,H.Li,M.Lee,M.Aly,H.-S.P.Wong,S.Mitra.Work in progress 2017 2019 w/TSMC2.5D SystemHBM-Type DRAMSi Logic DieSi InterposerMicron Scale ConnectivitySource:W.Hwang,W
17、.Wan,Y.Malviya,H.Li,M.Lee,M.Aly,H.-S.P.Wong,S.Mitra.Work in progress 2017 2019 w/TSMC25COMPUTE-MEMORYCOMPUTE-MEMORY INTEGRATIONINTEGRATIONHBM-Type DRAMSi Logic DieTSV+Bump Connectivity (Micron Scale)3D TSV SystemSource:W.Hwang,W.Wan,Y.Malviya,H.Li,M.Lee,M.Aly,H.-S.P.Wong,S.Mitra.Work in progress 201
18、7 2019 w/TSMC26COMPUTE-MEMORYCOMPUTE-MEMORY INTEGRATIONINTEGRATIONN3XT SystemDense ILV Connectivity(Nanometer Scale)Si Logic DieEnergy Efficient Logic(Thin Device Layers)High Density On-ChipNonvolatile MemoryHigh Speed On-Chip Nonvolatile MemoryEnergy Efficient Memory Access TransistorsNonvolatile M
19、emory CellsSource:W.Hwang,W.Wan,Y.Malviya,H.Li,M.Lee,M.Aly,H.-S.P.Wong,S.Mitra.Work in progress 2017 2019 w/TSMC27COMPUTE-MEMORYCOMPUTE-MEMORY INTEGRATIONINTEGRATIONBottom Electrodeoxide isolationswitching regionTop Electrode phase change materialPCMPhase change memoryTop ElectrodeBottom Electrodeme
20、tal oxideoxygen ion filamentoxygen vacancyRRAMResistive switching random access memoryfilamentBottom Electrodesolid electrolyteActive Top Electrodemetal atomsCBRAMConductive bridge random access memorySTT-MRAMSpin torque transfer magnetic random access memoryFERAMFerro-electric random access memoryF
21、erroelectric layerp-Sin+n+Interface Layertop gateSource:H.-S.P.Wong,S.Salahuddin,Nature Nanotech(2015)“NEW”MEMORIES FORCOMPUTE-MEMORY INTEGRATIONSoft MagnetPinned Magnettunnel barrier(oxide)currentRandom access,non-volatile,no erase before write,on-chip integration282DbaselinesystemAcceleratorCoresS
22、RAM on-chipmemoryNEW NEW MEMORYMEMORY:HIGH-BANDWIDTH,HIGH-CAPACITY,ON-CHIPSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019292DbaselinesystemOff-chip DRAM(LPDDR3)Capacity:4 GBytes Latency:50 ns BW:12 GBytes/s Read/write energy:17 pJ/bitAcceleratorCoresSRAM on-chipmemoryNEW NEW M
23、EMORYMEMORY:HIGH-BANDWIDTH,HIGH-CAPACITY,ON-CHIPSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019302DbaselinesystemOff-chip DRAM(LPDDR3)Capacity:4 GBytes Latency:50 ns BW:12 GBytes/s Read/write energy:17 pJ/bitAcceleratorCoresSRAM on-chipmemoryNEW NEW MEMORYMEMORY:HIGH-BANDWIDTH
24、,HIGH-CAPACITY,ON-CHIPNewsystemAcceleratorCoresSRAM on-chipmemoryOff-chip DRAM(LPDDR3)Capacity:(4 GBytes minus New Mem.Cap.)Latency:50 ns BW:12 GBytes/s Read/write energy:17 pJ/bitSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019312DbaselinesystemOff-chip DRAM(LPDDR3)Capacity:4
25、GBytes Latency:50 ns BW:12 GBytes/s Read/write energy:17 pJ/bitAcceleratorCoresSRAM on-chipmemoryNewsystemHigh Bandwidth,High Capacityboth criticalAcceleratorCoresSRAM on-chipmemoryOff-chip DRAM(LPDDR3)Capacity:(4 GBytes minus New Mem.Cap.)Latency:50 ns BW:12 GBytes/s Read/write energy:17 pJ/bitOn-c
26、hip New memory Capacity:sweep(up to 4 GBytes)Latency:sweep(down to 3ns)BW:sweep(up to 128 GBytes/s)Read/write energy:5 pJ/bitNEW NEW MEMORYMEMORY:HIGH-BANDWIDTH,HIGH-CAPACITY,ON-CHIPSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019325 ns memory access latency,5 pJ/bit access ene
27、rgyNEW MEMORY ESSENTIAL REQUIREMENTON-CHIP ON-CHIP CAPACITY CAPACITY MUST MUST EXCEED EXCEED DATADATA SIZESIZEEDP benefitsLanguage model(LSTM)2.5 GByte data sizeBandwidth(GBytes/s)Bandwidth(GBytes/s)ResNet-152(CNN)120 MByte data size1.3x4.2x1.3x2.9x-3.6x120 MByte64 GBytes/s4 GByte128102.1x 8x15x 30
28、x2.5 GByte2.1x 8x50 x100 GBytes/s1 MByte4 GByte1 MByteCapacityCapacity10128Source:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,201933ResNet-152(CNN)120 MByte data sizeBandwidth(GBytes/s)968064483216102420483072Capacity(MBytes)4096Bandwidth(GBytes/s)112128968064483216102420483072Capacit
29、y(MBytes)4096Language model(LSTM)2.5 GByte data size12850112NEW MEMORY ESSENTIAL REQUIREMENTON-CHIP ON-CHIP CAPACITY CAPACITY MUST MUST EXCEED EXCEED DATADATA SIZESIZEEDP benefitsSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019345 ns memory access latency,5 pJ/bit access energy
30、5 pJ/bit access energyNEW MEMORY ESSENTIAL REQUIREMENTHIGH BANDWIDTH MORE CRITICAL THANHIGH BANDWIDTH MORE CRITICAL THAN LATENCY LATENCYEDP benefitsLanguage model(LSTM)2.5 GByte data sizeBandwidth(GBytes/s)Bandwidth(GBytes/s)ResNet-152(CNN)120 MByte data size4.2x4.1x1.1x-3x20 ns64 GBytes/sLatency(ns
31、)35012810Latency(ns)50 x20 x 35x1.1x 20 x15x 30 x10 ns100 GBytes/s350101282.4x 3.9xSource:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019355 pJ/bit access energyBandwidth(GBytes/s)1129680644832165 1040Latency(ns)80Language model(LSTM)2.5 GByte data size50128120Bandwidth(GBytes/s)Sour
32、ce:Stanford/NTU:M.Aly,S.Mitra,TSMC:Yih(Eric)Wang,K.Akarvardar,2019361121289680644832165 1040Latency(ns)80ResNet-152(CNN)120 MByte data size4.220NEW MEMORY ESSENTIAL REQUIREMENTHIGH BANDWIDTH MORE CRITICAL THANHIGH BANDWIDTH MORE CRITICAL THAN LATENCY LATENCYEDP benefitsEnergy Execution Time1971X525X
33、320X159X63X100100010000System-Level BenefitsWorkload:Inference on ML AcceleratorN3XTN3XT:UP TO 2,000X2,000XENERGY EFFICIENCY BENEFITS101Lang.Model(LSTM)AlexNet(CNN)Captioning(LSTM)ResNet152(CNN)VGG19(CNN)N3XT Benefits:relative to 2D Baseline System(28nm silicon CMOS,LPDDR3)Inference:16-bit data,batc
34、h size of 1Source:Stanford/NTU:M.Aly,T.Wu,A.Bartolo,H.-S.P.Wong,S.Mitra et.al.,Proc.IEEE 201937N3XTN3XT SYSTEMSYSTEMSi Logic DieEnergy Efficient Logic(Thin Device Layers)Dense ILV Connectivity(Nanometer Scale)High Density On-ChipNonvolatile MemoryHigh Speed On-Chip Nonvolatile MemoryEnergy Efficient
35、 Memory Access TransistorsNonvolatile Memory Cells38N3XTN3XT SYSTEMSYSTEMSi Logic DieEnergy Efficient Logic(Thin Device Layers)Dense ILV Connectivity(Nanometer Scale)High Density On-ChipNonvolatile MemoryHigh Speed On-Chip Nonvolatile MemoryEnergy Efficient Memory Access TransistorsNonvolatile Memor
36、y Cells391D carbon nanotube(CNT)2D TMD(MoS2,WSe2,WS2)Photo credit:B.Radisavljevic et al.,Nature Nanotech.,p.147,2011NANOMETER-THIN NANOMETER-THIN TRANSISTOR CHANNEL1 nm 1 nm1101001,00010,0004Mobility(cm2/V-s)0123Channel thickness(nm)Source:S.-K.Su,L.-J.Li(TSMC),Nature Nanotech.,2019.MoS2WS2WSe2SiGeC
37、NTFilled:electronOpen:hole40Photo credit:User Mstroeck on en.wikipedia2D 2D LAYERED MATERIALS LAYERED MATERIALS(WS2,WSe2)0.50.40.30.20.1101000Effective mass(m0)100Mobility(cm2/V-s)MoS2-eWSe2-hION(A/m)20040060080020 nm10 nmGSDGSDSource:C.-C.Cheng et al.(TSMC),Symp.VLSI Tech.201941SHORT-CHANNELSHORT-C
38、HANNELCARBON NANOTUBE TRANSISTORS10 nm Gate Length5 nm Gate LengthVDS=-0.4VVDS=0.4VSS=70 mV/Dec70 mV/Dec10-510-610-710-8-1.00.00.5-0.5Vgs(V)Ids(A)10-510-610-710-8-1.00.5-0.50.0Vgs Vt(V)Ids(A)10-910-10VDS=-0.1 VSS=73 mV/DecLg=5 nmSource:C.Qiu,L-M.Peng(PKU),Science,201742CARBON NANOTUBECARBON NANOTUBE
39、 COMPUTERCOMPUTERSource:M.Shulaker,H.-S.P.Wong,S.Mitra(Stanford),Nature,2013instruction fetchdata fetcharithmetic block write-back43Kbit 6T SRAM(6144 CNFETs)CARBON NANOTUBE CARBON NANOTUBE FET CMOS SRAMSource:P.Kanhaiya,M.Shulaker(MIT),Symp.VLSI Tech.,201944MEMORYMEMORY INTEGRATIONINTEGRATIONON LOGI
40、C PLATFORMBetter transistor alone45Transistors integrated with memory in 3DMEMORYMEMORY INTEGRATIONINTEGRATIONON LOGIC PLATFORM46Normalized DensitySYSTEMSYSTEM INTEGRATIONINTEGRATIONA CONTINUUM FROM FAR BACK-END TO FRONT-ENDSource:IMECInterposerChip-on-wafer Wafer-on-waferMonolithic 3D10810710610510
41、4103102101 10047SOCIETAL NEEDS FOR ADVANCED ADVANCED TECHNOLOGY TECHNOLOGY IS INSATIABLEADVANCEDADVANCED TECHNOLOGYTECHNOLOGY A KEY DIFFERENTIATOR48CONTINUOUSBENEFITSNODE AFTER NODEContinuous transistor&memory advancesMemory logic integrationMULTIPLE ROADS MULTIPLE ROADS LEAD TO ROMESystem integrati
42、on withhigh connectivity4950A CALL TO ACTIONACTION:EARLY ENGAGEMENTSYSTEM TECHNOLOGYACADEMIA INDUSTRY RESEARCH51End of TalkQuestions?52CONTINUOUSBENEFITSNODE AFTER NODEContinuous transistor&memory advancesMemory logic integrationMULTIPLE ROADS MULTIPLE ROADS LEAD TO ROMESystem integration withhigh connectivityCOMMITTED TO PROVIDING THE MOSTADVANCED TECHNOLOGIES