1、计算机组成全册配套计算机组成全册配套 完整精品课件完整精品课件 Computer Organization while computer organization is implementation of this logical design For example, whether a computer will have a multiplication function, it is an architecture issue; while how to implement the multiplier belongs to the organization concept Ch. 1
2、 1.1 |Computer Implementations Def.1: physical Implementations of computer components in organization Def.2: The hardware out of which we make computer systems. |Transparency/Transparent: For existed things or attributes, from a point of view, they are looked as not existed, this concept is called t
3、ransparency Ch. 1 1.1 |Architectural attributes: instruction set,word length,I/O mechanism, addressing,etc |Organizational attributes: control signal,interface,memory technology,bus technology, hardware details transparent to programmer |Implement attributes: Integrated Circuits (ICs), Printed Circu
4、its (PC) boards, Power Supplies, Chassis, Connectors and Cables,etc Ch. 1 1.1 |Series Computers: computers with the same architecture and different organization |(Software) compatibility/compatible: The software can operate in all computers with the same architecture. The result is the same, differe
5、nce exists in running time Ch. 1 1.1 |Upward compatibility:program for low level computer can run over high level computer without modification |Backward compatibility:program for current computer can run over future computer without modification 1.2 Structure and Function |Computer is a complex sys
6、tem Millions of basic electronic components |How to describe a computer? Hierarchic |At each level, only structure and function are concerned for the designer Ch. 1 1.2 |Structure : the way in which components relate to each other |Function : the operation of individual components as part of the str
7、ucture |Top-down approach to describe Ch. 1 1.2 |All computer functions are: Data processing Data storage Data movement Control Ch. 1 1.2 |A Functional view of the computer Data Movement Apparatus Control Mechanism Data Storage Facility Data Processing Facility Ch. 1 1.2 |Data movement e.g. disk to
8、memory Data Movement Apparatus Control Mechanism Data Storage Facility Data Processing Facility Ch. 1 1.2 |Data Storage e.g. Internet download to disk Data Movement Apparatus Control Mechanism Data Storage Facility Data Processing Facility Ch. 1 1.2 |Data Processing from/to storage e.g. PS a picture
9、 Data Movement Apparatus Control Mechanism Data Storage Facility Data Processing Facility Ch. 1 1.2 |Processing from storage to I/O e.g. dealing by ATM Data Movement Apparatus Control Mechanism Data Storage Facility Data Processing Facility Ch. 1 1.2 |Structure Ch. 1 1.2 |Structure - Top Level Compu
10、ter Main Memory Input Output Systems Interconnection Peripherals Central Processing Unit Computer Communication lines Ch. 1 1.2 |CPU Computer Arithmetic and Logic Unit Control Unit Internal CPU Interconnection Registers CPU I/O Memory System Bus CPU Ch. 1 1.2 |CU CPU Control Memory Control Unit Regi
11、sters and Decoders Sequencing Logic Control Unit ALU Registers Internal Bus Control Unit Ch. 1 1.2 |Why study this course? In some degree, it teaches how we play game in cost and performance. As a designer, we can program a processor that is embedded in some real-time or larger system. Vocabulary Ce
12、ntral Processing Unit:中央处理单元中央处理单元/CPU Main memory:主存:主存 I/O subsystem 输入输入/输出子系统输出子系统 Interconnection:互连:互连 Component:部件:部件/组件组件 Arithmetic and logic unit:算术逻辑单元:算术逻辑单元 Register:寄存器:寄存器 Single-chip microcomputer:单片机:单片机 Integrated circuit:集成电路:集成电路 Vocabulary Architecture Pentium: Superscalar, inst
13、ruction executed in parellel ; Pentium pro: branch prediction, data flow analysis, speculative execution P:32bit. 64bit instructions:MMX; P:new float point instructions: 128bit : SSE, support 3-d graphics processing P: 32bit. Providing 128bit instructions: SSE2 Ch.2-2.2 |Classification of Computers
14、Single-chip Single-board Microcomputer Minicomputer Medium computer Large computer Super computer Ch.2-2.2 |Relationship between software and hardware Ch.2-2.2 |Soft hierarchies of a Computer application language high-level language assembly language OS (job control language) machine language (machi
15、ne instruction system ) micro program (microinstruction system ) top bottom Ch.2-2.2 Vocabulary Pipelining and parallel execution: 流水与流水与 并行执行并行执行 Speculative execution: 推测执行推测执行 Cache: 快速缓存快速缓存 Decimal: 十进制十进制 Binary: 二进制二进制 General purpose computer: 通用计算机通用计算机 Von Neumann Machine: 冯冯-诺依曼计算机诺依曼计算机
16、Opcode=operation code: 操作码操作码 Instruction cycle: 指令周期指令周期 Fetch cycle: 取(读)周期取(读)周期 Vocabulary Flowchart: 流程图流程图 Condition branch: 条件转移条件转移 Data transfer: 数据传送数据传送 Upward compatible: 向上兼容向上兼容 Multiplexor: 复用器复用器 Bus: 总线总线 Magnetic-core memory: 磁芯存储器磁芯存储器 End user: 端用户端用户 Speech recognition: 语音识别语音识别
17、 Videoconferencing: 视频会议视频会议 Vocabulary Multimedia authoring: 多媒体编著多媒体编著 Workstation: 工作站工作站 Client-server: 客户机客户机-服务器服务器 DRAMdynamic random access memory: 动态随机存取存储器动态随机存取存储器 Branch prediction: 转移预测转移预测 Throughput: 吞吐率吞吐率 Trade-off : 折衷折衷 Supercomputer: 超级计算机超级计算机/巨型机巨型机 Parallelism: 并行性并行性 Key poin
18、ts What is the first computer in the world? What features of von Nuemann machine is there? How about its structure? Moore law? Typical computer classification? Computer Organization External memory: in bytes Number of words How many words or Bytes Ch.4-4.1 |Unit of Transfer Internal Usually governed
19、 by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Normally, it is word Allowing byte Address length Disk: hard disk and floppy disk, clustered file storage; Optical: CD, DVD Tape MO Ch.4-4.1 |Organization o
20、f memory hierarchy registers memory By complier cache memory By hardware memory disks By hardware and OS (virtual memory) By programmer (files) 4.2 Cache memory principles |Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module Ch.4-4.2 |Cache
21、memory is a critical component of the memory hierarchy Compared to the size of main memory, cache is relatively small Operates at or near the speed of the processor Very expensive compared to main memory Cache contains copies of sections of main memory Ch.4-4.2 Ch.4-4.2 Cache/Main Memory Structure C
22、h.4-4.2 | Specification for figure 4.4 Main memory contains 2n words/units,K words consist of a block Total blocks: M= 2n /K Cache contains C lines, a line contains K memory words A cache line = a memory block CM At any time, at most C blocks can reside in the Cache A line can not be occupied by a f
23、ixed block Each line has a tag to identify which block of memory Many memory blocks can mapped into a cache line Ch.4-4.2 |Cache operation overview Part contents of main memory are also in Cache CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If n
24、ot present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot Ch.4-4.2 Cache Read Operation - Flowchart Ch.4-4.2 Typical Cache Organization 4.3 Cache Design |Key Techniques in Cache Design Siz
25、e Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches Ch.4-4.3 | Cache Size Selection Small size Cheap: Cache-Memory price is close to the price of main memory Low hit, access speed may be slower than accessing main memory Large size High hit: close to 100%, access speed
26、may be close to Cache access costly More Gates, slightly slower than small cache Large CPU area occupation Trade-off between capacity, price and speed No optimum cache size, 1K512K words are all effective Ch.4-4.3 |Mapping function CM Mapping mechanism needed Mapping memory block to cache line Which
27、 memory block occupies which cache line Mapping function is implemented in hardware Mapping function determines cache structure Typical mapping functions: Direct Mapping Associative Mapping Set Associative Mapping Ch.4-4.3 |Direct Mapping The simplest mapping, block fixed line Each main memory block
28、 is assigned to a specific line in the cache: i = j modulo m where i is the cache line number assigned to main memory block j, m is the number of lines in the cache. Cache line Main Memory blocks held 0 0, m, 2m, 3m2s-m 1 1, m+1, 2m+12s-m+1 m-1 m-1, 2m-1, 3m-12s-1 Ch.4-4.3 0 0 0 0 10 0 1 0 10 1 0 0
29、10 1 1 0 11 0 0 0 11 0 1 0 11 1 0 0 11 1 1 0 1 000 001 010 011 100 101 110 111 Cache Memory Ch.4-4.3 |Implement for Direct Mapping Given an address, how to read Cache? Which Cache line? Which word? Method: memory address separated into 3 parts: Low w bits identify content/word Middle r bits identify
30、 Cache line Left bits identify whether the data is needed Tag identify which memory block Ch.4-4.3 | Direct Mapping Cache Organization Ch.4-4.3 | Direct Mapping Address Structure 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two
31、 blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag |For all three cases, the example includes the following elements: The cache can hold 64Kb; The cache in blocks of 4b each. This means that the cache is organized as 16K=214 line of 4b each; The
32、 main memory consist of 16Mb, which each byte directly addressable by a 24-bit address (224=16M). We can know main memory consist of 4M blocks of 4b each Ch.4-4.3 | Direct Mapping Example Notes: 214 lines, 4 words/line, 28 different data can be put into a line Ch.4-4.3 |Direct Mapping Summary Addres
33、s length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2wwords or bytes Number of blocks in main memory = 2s+w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s r) bits Ch.4-4.3 |Direct Mapping pros Full associative mapping between two sets Set as
34、sociative mapping=direct mapping + Full associative mapping A cache of m lines is divided into v sets, k lines/set, then m=v x k Set number of Cache i = block number of memory mod v A given block may map to any line in a given set Ch.4-4.3 | K Way Set Associative Cache Organization Block number mod
35、v=0 Ch.4-4.3 |Set Associative Mapping Example 13 bit set number Block number in main memory is modulo 213 Block 000000, 00A000, 00B000, 00C000 map to same set Ch.4-4.3 | Set Associative Mapping Address Structure Use set field to determine cache set to look in Compare tag field to see if we have a hi
36、t e.g AddressTag DataSet number 1FF 7FFC1FF12345678 1FFF 001 7FFC00111223344 1FFF tag+set number a block of memory In a set, the tag of block is exclusive: there is no same tag in a set For m=v x k, k=1,direct mapping v=1,associative mapping 2-way set associative mapping is in common use, its hit ra
37、te is much higher than direct mapping No proportional between hit rate and k Ch.4-4.3 |Set Associative Mapping Example 02c Ch.4-4.3 |Set Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of bl
38、ocks in main memory = 2s Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k x 2d Size of tag = (s d) bits Ch.4-4.3 An example | Suppose a computer, its main memory is 4MB,Caches capacity is 16KB,block size is 8 words,word length is 32b,addressed by word, please desi
39、gn a cache organization of 4-way set associative mapping, require: 1 draw the structure of main memory address and mark the bits of each segment; 2 initially, Cache is empty,CPU fetches 100 words from memory unit0, unit1, unit99 in sequence, repeat this sequence 8 times. How much is the hit rate of
40、Cache ? 3 if the speed of cache is 6 time that of main memory, introducing cache, how many times is the speed of CPU accessing memory improved? Ch.4-4.3 | Solutions First step: obtain address length 4MB/4B=1Maddress length=20bit Second step: obtain Cache lines(blocks) and sets Word address length=lo
41、g28=3 bits 16KB/(84B)=0.5K=29(lines) 4-way set associative means 4lines/set, 29/4=27(sets)set address length=7 bit Third step: draw Fig. of word segment 2) (87+700)/800=98% 3) suppose time of accessing cache is t, then time of accessing memory is 6t. 6t/(98% t+2% 7t)=5.3(times) Tag set word Ch.4-4.3
42、 |Replacement Algorithms (1) - Direct mapping No choice Each block only maps to one line(fixed) Replace that line Ch.4-4.3 | Replacement Algorithms (2) - Associative which methods can be used to improve the hit ratio? Computer Organization RAS: Row address selection CAS: Column address selection WE:
43、 Write enable OE: Output enable Refreshing row by row No operations while refreshing Refresh before reading or writing Ch.5-5.1 Module Organization - a simple example Ch.5-5.1 | Module Organization- word length extension Ch.5-5.1 | Module Organization - word number extension Two chips of 1K 8bit 2K
44、8bit memory Ch.5-5.1 | Module Organization- word length and number extension Ch.5-5.1 | Connection of CPU and Memory Connection of address lines Address lines of CPU are usually more than memorys Low bits low bits High bits of CPU reserved or used for CS Connection of data lines Data lines of CPU mu
45、st equal data bits of memory, if necessary, the chip bit is enlarged Connection command line Read/write lines are directly connected to that of memory CS is connected to MREQ and high address bits of CPU Logic circuit may be used, such as decoder Correctly select types of chip and number ROM used fo
46、r system area Ram used for user area Ch.5-5.1 | An Example of CPU-Memory v Suppose CPU has 16 address lines, 8 data lines. MREQ is used for access memory control, WR is read/write control. Now, we have following chips: 1K4 RAM;4K 8 RAM; 8K 8 RAM; 2K 8 ROM; 4K 8 ROM; 8K 8 ROM; 74LS138 decoder and all
47、 kinds of gates, as figure. Please draw the diagram of CPU connecting memory, conditions: 1. 6000H67FFH is system area; 6800H6BFFH is user area; 2. select reasonable chips, how many chips used, respectively? Ch.5-5.1 |Solutions First step: memory capacity determination System area: 67FFH-6000H=7FFH2
48、K Data unit length: 8bit 2K8 ROM User area: 6BFFH-6800H=3FFH1K 1K8 RAM Ch.5-5.1 Second step: select chips One 2K8 ROM Two 1K 4 RAM (parallel connection) Third step: allocate CPU address lines A0A10 of CPU connect addresses of ROM A0A9 connect two chips of RAM address lines Left high bits and MREQ ar
49、e used for chip selection Ch.5-5.1 5.2 Error Correction |Hard Failure Permanent defect |Soft Error Random, non-destructive No permanent damage to memory Detected using Hamming error correcting code Ch.5-5.2 |Error Correcting Code Function 5.3 Advanced DRAM Organizaition |Newer RAM Technology Basic D
50、RAM keeps same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read Cache DRAM Larger SRAM component Used as cache or serial buffer Synchronous DRAM (SDRAM) Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in con