CS 251 Assignment 6 Winter 2014

CS 251, Fall 2011, Assignment 6.1.
1
3% of course mark
Due Week 12, Friday, April 4, 2:00 PM
Print these pages and write your solutions in the space provided. Staple your solutions to the assignment cover sheet from the course webpage (with the cover sheet rst) and deposit your assignment in the drop-box outside MC4065. You will receive a 0 on the assignment if you do not include the cover page. 1. (15 points) Here is a series of address references given as word addresses in both decimal and binary; we also list the relative time at which these references occur:
Addr Binary Time 0 00000 1 1 00001 2 2 00010 3 3 00011 4 8 01000 5 9 01001 6 10 01010 7 11 01011 8 0 00000 9 1 00001 10 2 00010 11 7 00111 12 3 00011 13 0 00000 14 1 00001 15
Below are four dierent 8-word caches (similar to Figure 5.14 of the text). For each cache type, assuming the cache is initially empty, show the nal contents of the cache, and in the table at the bottom, show how many cache hits and misses there are for each type of cache. Write your solution in the tables below, assuming the above word address are 5-bit binary numbers. You should write the binary form of the tag in the tables below, except for the fully associative cache, where you may write the decimal form of the tag. Assume a LRU replacement scheme. When inserting an element into the cache, if there are multiple empty slots for that index, you should put the new element in the left-most empty slot. Direct mapped Block Tag Data 0 1 2 3 4 5 6 7 Set 0 1 Tag Data Tag Data
Two-way set associative Set Tag Data Tag Data 0 1 2 3
Four-way set associative Tag Data Tag Data
Fully associative Set 0 Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data
Write the number of cache hits and misses for each scheme in the table below: Hits Direct Mapped Two-way s.a. Four-way s.a. Fully Associate Misses
2. (10 points) Consider the following MIPS code for summing the values in an array in memory. Assume that register $4 has already been setup to point to the rst value of this array. 100 104 108 112 116 120 124 128 addi $1, $0, 240 addi $2, $0, 0 lw $3, 0($4) addi $1, $1, -1 add $2, $2, $3 addi $4, $4, 4 bne $1, $0, -5 nop
Assume that this code is to run on datapath of Figure 4.65, page 325 of the text, page 619 of the course notes (i.e., that does data forwarding and stalls the instruction following a branch if the branch is taken). Further, assume that each instruction takes 1 clock cycle, and that memory is read into cache in blocks, and that reading a block of n words of memory takes 49 + n clock cycles. (E.g, if the block size is 1, then it takes 50 clock cycles to read a word from memory; if the block size is 2, then it take 51 clock cycles to read two consecutive words of memory; if the block size is 4, then it take 53 clock cycles to read four consecutive words of memory). How long will this segment of code take to execute, assuming cache block sizes of 1, 2, 4, 8, and 16? Fill in the following table to express determine your answer. Assume that the pipeline is full (i.e., do not worry about clock cycles to initially ll the pipeline). Assume that the program starts in cache and stays in cache during the entire execution. Further assume anything read into cache during the execution of the program remains in cache during the entire execution of this program. Cache Block Size Block size 1 Block size 2 Block size 4 Block size 8 Block size 16 Instructions Memory Total
3. (5 points) Suppose we have a 32-bit computer with 1 GB of memory. For virtual memory on this computer, we need to translate a 32-bit virtual address into a 30-bit physical address, which is done via the page table. Below is part of the page table for translating the 32-bit virtual address to a 30 bit physical address. The page size is 4KB. Below the page table is a list of ve virtual addresses. Using the page table, convert the virtual addresses to physical addresses. If the table can not be used to convert a particular virtual address, write XXX on the line. The numbers to the left of the page table are the binary indexes into the table.
Page Table
V 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ... 0101 0101 0101 0101 ... 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0010 0011 0000 0001 0010 0011 0 1 1 1 1 1 1 1 00 01 10 00 00 00 00 00 Physical Address 0000 0000 1010 0011 0001 0001 0001 0001 0000 0000 0101 0001 0000 1010 1100 0000 ... 0011 0000 0011 0000 0011 0000 0011 0000 ... 0000 0000 0000 1010 0011 0100 0101 0110
Virtual Address
0000 0000 0000 0000 0001 0101 1000 0000 0000 0000 0000 0000 0000 0101 0000 0001 0000 0000 0101 0000 0011 0100 0000 0000 0000 0000 0000 0000 0011 0111 1000 1000 0000 0000 0101 0000 0011 0100 0000 1111
Physical Address
4. (10 points) Below is the TLB and part of the page table for translating a 32-bit virtual address to a 30 bit physical address. The page size is 4KB. The letters to the left of the TLB are labels to be used in parts (b) and (c) of this question. The numbers to the left of the page table are the decimal indexes into the table.
Page Table TLB

Valid Dirty Ref Valid Dirty Ref A B C D E 1 1 1 1 0 0 1 0 1 0 1 1 0 1 0 Tag 0000 0000 0100 0011 0000 0000 0000 0000 0000 1111 0010 0100 0010 0100 0010 0000 0000 0000 0000 0001 0000 0000 0000 0000 0110 Physical page address Physical page OR Disk address 00 0000 1000 0000 0100 00 0000 0000 0000 1000 00 0100 0110 0000 0000 00 0000 1111 0000 1111 00 1111 0000 1110 0000 10 0000 0010 0000 1011 00 1000 1000 1000 1000 11 1000 0100 0010 0001 00 1000 0001 0100 0010 10 1010 1010 1010 0101 00 0000 1000 0000 0011
0
00 0000 1000 0000 0100
1 1 1 1 1 1 1 1 0 1 1
1 1 0 0 0 0 0 1 0 1 1
1 1 0 1 0 1 1 0 0 0 1
1
00 0000 0111 0000 0000
2
00 0000 0000 0000 0101 00 0000 0000 0000 1000
3 4
00 0000 0000 1111 0000
5 6 7 8 9 10
(a) (5 points) The following is a list of ve virtual addresses. Using the TLB and the page table, convert the virtual addresses to physical addresses. Note in the table where the physical address comes from (e.g, TLB, PT, or NEITHER if the address is not in the TLB or page table). If the page is not in memory, write DISK instead of the physical address. Source Physical Address Virtual Address 0010 0100 0010 0100 0010 0000 0000 0001 00 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0001 0000 0000 0000 0000 0110 0000 0010 1010 0000 0000 0000 0000 1111 0000 0000 0001 0000 0000 0000 0000 0100 0000 1100 0001 (b) (2 points) Suppose we access memory address 0000 0000 1111 0000 0000 0000 0000 0000. This page is not in the TLB. Into which TLB entry (A,B,C,D,E) should we place this page? Justify your answer.
(c) (2 points) Having made the memory access in the previous question, we now make a second memory access, this time to memory address 0000 0000 1111 0000 0001 0000 0000 0000. This page is also not in the TLB. In which TLB entry (A,B,C,D,E) should we place this page? Justify your answer.
5. (10 points) In this question, you will consider the eects of cache and the TLB on virtual memory performance. Consider (slightly modied) code sequence in the previous question: 100 104 108 112 116 120 124 128 addi $1, $0, 256 addi $2, $0, 0 lw $3, 0($4) addi $1, $1, -1 add $2, $2, $3 addi $4, $4, 4 bne $1, $0, -5 nop
You are to compute timings for this code, with and without cache and TLB. The cache block size is 1. Assume that the program is resident in memory, and that cache (and the TLB) are empty to start with. Further assume that once in cache, the program will remain in cache, and that reading one word of memory costs 50 clock cycles. Notes: Again assume that register 4 has already been setup to point to the rst value of the array; further assume that all the values in this array are on a single page of memory. If an instruction is in cache and its page address is in the TLB, then it takes no clock cycles to read the instruction. If an instruction is in cache but its page address is not in the TLB, then it take one memory access to look up the page oset in the page table, but no additional memory accesses to read the instruction. If an instruction is not in cache and there is no TLB (or its page address is not in the TLB), then it will take two memory accesses to read the instruction. Assume that the entire pipeline stalls while waiting for a memory request (i.e., instructions already in the pipeline do not complete while a new instruction is being read from memory). As an example to show you the analysis we want, consider the case when there is no TLB and no cache. The instructions themselves take 1538 clock cycles to execute (2 for lines 100104, and then lines 108128 are executed 256 times each). Since we have no cache or TLB, for each instruction it takes one memory access to determine the physical page from the virtual page, and a second memory access to fetch the instruction from memory. Likewise, each of the 256 lw commands takes two memory accesses. This gives us a total of (1538+256)2 memory accesses, each of which take 50 clock cycles. The total time is 180,938 clock cycles, as shown in the table. Since the number of cycles to actually execute the instructions is xed for this example, we have lled in the Inst column for you; you still need to add this to the number of clock cycles for the memory access to compute the total time. If you wish, you may include for each case an analysis like that of the previous paragraph on a separate sheet of paper.
System No TLB, No Cache TLB, No Cache Cache, No TLB TLB and Cache
Inst. Memory Total 1538 (1538 + 256) 50 2 = 179, 400 180,938 1538 1538 1538
If we wanted to improve the performance of this computer (with both TLB and cache) in this example, which would have a bigger impact: making the CPU run twice as fast, or using a block size 2 cache where it costs 51 of the old clock cycles to fetch 2 words of memory? (Assume that memory speed does not increase.)
The remaining questions will NOT be used to compute your assignment mark; they are included here as additional questions you may want to try to aid your understanding of the course material.
6. Exercises from the textbook: 5.1, 5.2.1, 5.2.2, 5.3, 5.7.1, 5.7.2, 5.7.3, 5.11, 5.12, 5.13. 7. Suppose we have a virtual memory system with a 4 word, fully associative TLB but no cache. Assume the following: Each access of a physical word of memory takes 5ns. All updates necessary for a page fault are completed in 10ms. The page size is 1 Megabyte, giving us a 12 bit virtual page number (see Figure 7.21). Our program (executable and data) all t within 8 megabytes and are stored in the rst 8 megabytes of virtual memory, so only the rst 8 entries of the page table are needed. The page table starts and remains resident in physical memory. Assume an LRU replacement scheme for all tables. Do not worry about the time required to update the TLB. Suppose we make the following virtual memory accesses, starting from the situation shown in the gure below. Fill in the Action entries in the table (using the choices from the Action Table on the next page) and give the time required for each action. Compute the total amount of time required for these memory accesses. In this table, we have written the High 12 Bits as a 4 digit decimal number, and we have written the Low 20 Bits as a 7 digit decimal number. High 12 Bits 0000 0000 0001 0000 0002 0003 0004 0000 0001 0002 Low 20 Bits 0001000 0001004 0001008 0002000 0001012 0001016 0002000 0001020 0001000 0001024 Action Time
Name Page Fault TLB Page Table
Action Table Action Page fault Read memory using TLB Read memory using page table

CS 251 Assignment 6 Winter 2014

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

CS 251 Assignment 6 Winter 2014

Enviado por

Direitos autorais:

Formatos disponíveis

CS 251, Fall 2011, Assignment 6.1.

Two-way set associative Set Tag Data Tag Data 0 1 2 3

Four-way set associative Tag Data Tag Data

Page Table TLB

00 0000 0000 1111 0000

Name Page Fault TLB Page Table

Você também pode gostar