1、4:56优秀课件,精彩无限!1Primary storage:Main memory(RAM)Secondary Storage:Peripheral devicesDisk drivesTape drives4:56优秀课件,精彩无限!2RAM is usually volatile.RAM is about 1/4 million times faster than disk.MediumEarly 1996 Mid 1997 Early 2000RAM$45.007.001.50Disk0.250.100.01Floppy0.500.360.25Tape0.030.010.0014:56
2、优秀课件,精彩无限!3Minimize the number of disk accesses!1.Arrange information so that you get what you want with few disk accesses.2.Arrange information to minimize future disk accesses.An organization for data on disk is often called a file structure(p261).Disk-based space/time tradeoff(p74):Compress infor
3、mation to save processing time by reducing disk accesses.4:56优秀课件,精彩无限!44:56优秀课件,精彩无限!5A sector is the basic unit of I/O.Interleaving factor:Physical distance between logically adjacent sectors on a track.4:56优秀课件,精彩无限!6Locality of Reference:When record is read from disk,next request is likely to co
4、me from near the same place in the file.Cluster:Smallest unit of file allocation,usually several sectors.Extent:A group of physically contiguous clusters.Internal fragmentation:Wasted space within sector if record size does not match sector size;wasted space within cluster if file size is not a mult
5、iple of cluster size.4:56优秀课件,精彩无限!7Seek time:Time for I/O head to reach desired track.Largely determined by distance between I/O head and desired track.Track-to-track time:Minimum time to move from one track to an adjacent track.Average Seek time:Average time to reach a track for random access.4:56
6、优秀课件,精彩无限!8Rotational Delay or Latency:Time for data to rotate under I/O head.One half of a rotation on average.At 7200 rpm,this is 8.3/2=4.2ms.Transfer time:Time for data to move under the I/O head.At 7200 rpm:Number of sectors read/Number of sectors per track*8.3ms.4:56优秀课件,精彩无限!916.8 GB disk on 1
7、0 platters=1.68GB/platter13,085 tracks/platter256 sectors/track512 bytes/sectorTrack-to-track seek time:2.2 msAverage seek time:9.5ms4KB clusters,32 clusters/track.Interleaving factor of 3.5400RPM4:56优秀课件,精彩无限!10Read a 1MB file divided into 2048 records of 512 bytes(1 sector)each.Assume all records
8、are on 8 contiguous tracks.First track:9.5+11.1/2+3 x 11.1=48.4 msRemaining 7 tracks:2.2+11.1/2+3 x 11.1=41.1 ms.Total:48.4+7*41.1=335.7ms4:56优秀课件,精彩无限!11Read a 1MB file divided into 2048 records of 512 bytes(1 sector)each.Assume all file clusters are randomly spread across the disk.256 clusters.Clu
9、ster read time is (3 x 8)/256 of a rotation for about 1 ms.256(9.5+11.1/2+11.1x3 x(8/256)is about 4119.2 ms.4:56优秀课件,精彩无限!12Read time for one track:9.5+11.1/2+3 x 11.1=48.4ms.Read time for one sector:9.5+11.1/2+(1/256)11.1=15.1ms.Read time for one byte:9.5+11.1/2=15.05 ms.Nearly all disk drives read
10、/write one sector at every I/O access.Also referred to as a page.P288:8.2 if clusters are spread randomly across the disk?4:56优秀课件,精彩无限!13The information in a sector is stored in a buffer or cache.If the next I/O access is to the same buffer,then no need to go to disk.There are usually one or more i
11、nput buffers and one or more output buffers.4:56优秀课件,精彩无限!14A series of buffers used by an application to cache disk data is called a buffer pool.Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and“swapping”between disk and RAM.4:56优秀课件,精彩无限!15
12、4:56优秀课件,精彩无限!16Which buffer should be replaced when new data must be read?First-in,First-out:Use the first one on the queue.Least Frequently Used(LFU):Count buffer accesses,reuse the least used.Least Recently used(LRU):Keep buffers on a linked list.When buffer is accessed,bring it to front.Reuse th
13、e one at end.If 3 9 10 4 7 3 5 3 8 2 7 come into 3-buffer pool(empty initially)?4:56优秀课件,精彩无限!17class BufferPool /(1)Message Passingpublic:virtual void insert(void*space,int sz,int pos)=0;virtual void getbytes(void*space,int sz,int pos)=0;class BufferPool /(2)Buffer Passingpublic:virtual void*getblo
14、ck(int block)=0;virtual void dirtyblock(int block)=0;virtual int blocksize()=0;4:56优秀课件,精彩无限!18Disadvantage of message passing:Messages are copied and passed back and forth.Disadvantages of buffer passing:The user is given access to system memory(the buffer itself)The user must explicitly tell the b
15、uffer pool when buffer contents have been modified,so that modified data can be rewritten to disk when the buffer is flushed.The pointer might become stale when the buffer pool replaces the contents of a buffer.4:56优秀课件,精彩无限!19Logical view of files:An array of bytes.A file pointer marks the current
16、position.Three fundamental operations:Read bytes from current position(move file pointer)Write bytes to current position(move file pointer)Set file pointer to specified byte position.4:56优秀课件,精彩无限!20#include void fstream:open(char*name,openmode mode);Example:ios:in|ios:binaryvoid fstream:close();fst
17、ream:read(char*ptr,int numbytes);fstream:write(char*ptr,int numbtyes);fstream:seekg(int pos);fstream:seekg(int pos,ios:curr);fstream:seekp(int pos);fstream:seekp(int pos,ios:end);4:56优秀课件,精彩无限!21Problem:Sorting data sets too large to fit into main memory.Assume data are stored on disk drive.To sort,
18、portions of the data must be brought into main memory,processed,and returned to disk.An external sort should minimize disk accesses.4:56优秀课件,精彩无限!22Secondary memory is divided into equal-sized blocks(512,1024,etc)A basic I/O operation transfers the contents of one disk block to/from main memory.Unde
19、r certain circumstances,reading blocks of a file in sequential order is more efficient.(When?)Primary goal is to minimize I/O operations.Assume only one disk drive is available.4:56优秀课件,精彩无限!23Often,records are large,keys are small.Ex:Payroll entries keyed on ID numberApproach 1:Read in entire recor
20、ds,sort them,then write them out again.Approach 2:Read only the key values,store with each key the location on disk of its associated record.After keys are sorted the records can be read and rewritten in sorted order.4:56优秀课件,精彩无限!24Quicksort requires random access to the entire set of records.Bette
21、r:Modified Merge-sort algorithm.Process n elements in Q(log n)passes.A group of sorted records is called a run 顺串.4:56优秀课件,精彩无限!25Split the file into two files.Read in a block from each file.Take first record from each block,output them in sorted order.Take next record from each block,output them to
22、 a second file in sorted order.Repeat until finished,alternating between output files.Read new input blocks as needed.Repeat steps 2-5,except this time input files have runs of two sorted records that are merged together.Each pass through the files provides larger runs.4:56优秀课件,精彩无限!264:56优秀课件,精彩无限!
23、27Is each pass through input and output files sequential?What happens if all work is done on a single disk drive?How can we reduce the number of Merge-sort passes?In general,external sorting consists of two phases:Break the files into initial runsMerge the runs together into a single run.4:56优秀课件,精彩
24、无限!28General approach:Read as much of the file into memory as possible.Perform an in-memory sort.Output this group of records as a single run.4:56优秀课件,精彩无限!29Break available memory into an array for the heap,an input buffer,and an output buffer.Fill the array from disk.Make a min-heap.Send the small
25、est value(root)to the output buffer.4:56优秀课件,精彩无限!30If the next key in the file is greater than the last value output,thenReplace the root with this keyelseReplace the root with the last key in the arrayAdd the next record in the file to a new heap(actually,stick it at the end of the array).4:56优秀课件
26、,精彩无限!314:56优秀课件,精彩无限!32Simple merge-sort:Place runs into two files.Merge the first two runs to output file,then next two runs,etc.Repeat process until only one run remains.How many passes for r initial runs?Is there benefit from sequential reading?Is working memory well used?Need a way to reduce th
27、e number of passes.4:56优秀课件,精彩无限!33With replacement selection,each initial run is several blocks long.Assume each run is placed in separate file.Read the first block from each file into memory and perform an r-way merge.When a buffer becomes empty,read a block from the appropriate run file.Each reco
28、rd is read only once from disk during the merge process.4:56优秀课件,精彩无限!34In practice,use only one file and seek to appropriate block.4:56优秀课件,精彩无限!35Assume working memory is b blocks in size.How many runs can be processed at one time?The runs are 2b blocks long(on average).How big a file can be merge
29、d in one pass?4:56优秀课件,精彩无限!36Larger files will need more passes-but the run size grows quickly!In k merge passes,we process 2b(k+1)blocks.Example:0.5MB working memory,4KB blocks,yield 128 blocks for working memory.Average run size is 1MB,so 128MB can be sorted in one pass on average.16GB in two mer
30、ge passes.4:56优秀课件,精彩无限!37A good external sorting algorithm will seek to do the following:Make the initial runs as long as possible.At all stages,overlap重叠、并行 input,processing and output as much as possible.Use as much working memory as possible.Applying more memory usually speeds processing.If possible,use additional disk drives for more overlapping of processing with I/O,and allow for more sequential file processing.4:56优秀课件,精彩无限!38ExerciseP2888.2,8.3,8.8