## 四立垂灣科技大學 大班 大班 大班 一 一學期以心學致命題用級 | Floating-point divide | Computer MFP (computer with floating point) has floating-point hardware and can therefore implement the floating-point operation directly. It requires the following number of clock cycles for each instruction class: | Floating-point multiply Floating-point add Floating-point divide Integer instructions | 3. In the embedded market, where cost is crucial, processors sometime implement floating point only in software. We are interested in two implementations of a computer, one with and one without special floating-point hardware. Consider a program, P, with the following mix of operations: | copies the contents of register 11 into register 8, provided that the value in register 4 is nonzero (otherwise it does nothing). The movinstruction is similar but copying takes place only if the register's value is zero. Show how to use the new instructions to put whiche is larger, register 8's value or register 9's value, into register 10. It values are equal, copy either into register 10. You may use registe as an extra register for temporary use. Do not use any conditional branches. (10%) | <ol> <li>One extension of the MIPS instruction set architecture has two new<br/>instructions called movn (move if not zero) and movz (move if zer<br/>For example, the instruction<br/>movn \$8, \$11, \$4</li> </ol> | <ol> <li>For the following set of variables {CPI, clock rate, cycle time, } I, C}, identify all of the subsets that can be used to calculate execution time. Each subset should be minimal; that is, it should contain any variable that is not needed. An example subset is {Cycle time}. Let I = number of instructions in program and C = number of cycles in program. (15%)</li> </ol> | |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 4 20 | Computer MFP (computer with floating point) has floating-point hardware and can therefore implement the floating-point operations directly. It requires the following number of clock cycles for each instruction class: | 10%<br>15%<br>5%<br>70% | In the embedded market, where cost is crucial, processors sometimes implement floating point only in software. We are interested in two implementations of a computer, one with and one without special floating-point hardware. Consider a program, P, with the following mix of operations: | copies the contents of register 11 into register 8, provided that the value in register 4 is nonzero (otherwise it does nothing). The movz instruction is similar but copying takes place only if the register's value is zero. Show how to use the new instructions to put whichever is larger, register 8's value or register 9's value, into register 10. If the values are equal, copy either into register 10. You may use register 1 as an extra register for temporary use. Do not use any conditional branches. (10%) | One extension of the MIPS instruction set architecture has two new instructions called movn (move if not zero) and movz (move if zero). For example, the instruction movn \$8, \$11, \$4 | For the following set of variables {CPI, clock rate, cycle time, MIPS, I, C}, identify all of the subsets that can be used to calculate execution time. Each subset should be minimal; that is, it should not contain any variable that is not needed. An example subset is {CPI, I, cycle time}. Let I = number of instructions in program and C = number of cycles in program. (15%) | 國立臺灣科技大學 \* 2 # 4 3 # | | (b) The relatively high cost of communications: Suppose we have an application running on a 32-processor multiprocessor, which has a 200ns time to handle reference to a remote memory. For this application, assume that all the references except those involving communication hit in the local memory hierarchy. Processors are stalled on a remote request, and the processor clock rate is 2 GHz. If the base CPI (assuming that all references hit in the cache) is 0.5, how much fast is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference? (10%) | 4. Two important hurdles make parallel processing challenging. (a) The limited parallelism available in programs: Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the original computation can be sequential? (10%) | (c) Assuming the instruction counts from (b), what is the execution<br>time (in seconds) for the program run on MFP and MNFP? (5%) | (b) If the computer MFP needs 300 million instructions for this<br>program, how many integer instructions does the computer<br>MNFP require for the same program? (10%) | Both computers have a clock rate of 1000 MHz. (a) Find the native MIPS ratings for both computers. (10%) | Floating-point multiply 30 Floating-point add 20 Floating-point divide 50 | Computer MNFP (computer with no floating point) has no floating-point hardware and so must emulate the floating-point operations using integer instructions. The integer instructions all take 2 clock cycles. The number of integer instructions needed to implement each of the floating-point operations is as follows: | Integer instructions 2 | |--|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------| |--|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------| | 技大學 | | |-----------------------------------------|------------------| | july | 平平 | | 10 - 10 - 10 - 10 - 10 - 10 - 10 - 10 - | N N | | 學期間代別外以今題用 | 日上今<br>下今第 時 会 數 | | 英 | 章章 | | - PE | 2 4 | | 3 114 | | | | 70.00 | 國山南鄉料山 | (b) Suppose we can improve the miss rate to 0.03 misses per reference by dosabling the cache size. This causes the cache access time to increase to 1.2 clock cycles. Using the AMAT as a metric, determine if this is a good trade-off (10%) Consider three processors with different cache configurations Cache 1: Direct-mapped with one-word blocks Cache 2: Direct-mapped with one-word blocks Cache 3: Two-way set associative with four-word blocks Cache 1: Instruction miss rate: 2%; Data miss rate: 6% Cache 2: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 2%; Data miss rate: 3% Cache 3: Instruction miss rate: 4%; Data miss rate: 4% Cac | Consider a processor with a 2 ns clock, a miss penalty of a miss rate of 0.05 misses per instruction, and a cache ac (including hit detection) of 1 clock cycle. Assume that the miss penalties are the same and ignore other write stalls. (a) Find the AMAT for the processor (5%) (b) Suppose we can improve the miss rate to 0.03 misses. | Average memory access time (A) alternative cache designs, Averag time to access memory considerin frequency of different accesses: AMAT = Hit Time + | | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | with different cache pped with one-word pped with one-word pped with seasurements have bamiss rate: 2%; Data miss rate: 2%; Data half of the instruction spenalty is 6 + Blotsured on a processor spenalty in the processor spenalty is 6 + Blotsured on a Blotsu | a 2 ns c per instr f1 clock e and ig processe | ime (A)<br>Averag<br>nsideriu<br>esses: | | | 03 misses per reference by the access time to increase netric, determine if this is blocks four-word blocks a miss rate: 6% a miss rate: 4% a miss rate: 3%s ans contain a data reference, ck size in words. The CPI r with Cache I and was pends the most cycles on | Consider a processor with a 2 ns clock, a miss penalty of 20 clock cycles, a miss rate of 0.05 misses per instruction, and a cache access time (including hit detection) of 1 clock cycle. Assume that the read and write miss penalties are the same and ignore other write stalls. (a) Find the AMAT for the processor (5%) (b) Suppose we can improve the miss rate to 0.03 misses per reference by | Average memory access time (AMAT) is commonly used to examine alternative cache designs. Average memory access time is the average time to access memory considering both hits and misses and the frequency of different accesses: AMAT = Hit Time + Miss rate × Miss penalty | |