-
1
Lecture 1: Measuring Performance
-
�Topics: (Sections 1.1, 1.4, 1.5, 1.8)
- .Technology trends
- .Performance summaries
- .Performance equations
-
-
Historical Microprocessor Performance
-
15x performance growth can be attributed to architectural innovations
- 3
- Processor Technology Trends
-
�Shrinking of transistor sizes: 250nm (1997) .
- 130nm (2002) .65nm (2007) .32nm (2010)
- �Transistor density increases by 35% per year and die sizeincreases by 10-20% per year� more cores!
- �Transistor speed improves linearly with size (complex
- equation involving voltages, resistances, capacitances)�
- can lead to clock speed improvements!
- �Wire delays do not scale down at the same rate as logicdelays
-
4Power Consumption Trends
- �Dyn power a activity x capacitance x voltage2x frequency
- �Capacitance per transistor and voltage are decreasing,
- but number of transistors is increasing at a faster rate;
- hence clock frequency must be kept steady
- �Leakage power is also rising
- �Power consumption is already between 100-150W inhigh-performance processors today
-
5Where Are We Headed?
- �Modern trends:
- .Clock speed improvements are slowing
- .power constraints
- .already doing less work per stage
-
.Difficult to further optimize a single core for performance
.Multi-cores: each new processor generation will
-
-
6Recent Microprocessor Trends
2004
-
2010
Source: Micron University Symp.
-
Transistors: 1.43x / year
-
Cores: 1.2 -1.4xPerformance: 1.15x
Frequency: 1.05x
-
Power: 1.04x
- 7Modern Processor Today
- �Intel Core i7
- .Clock frequency: 3.2 �3.33 GHz
- .45nm and 32nm products
- .Cores: 4 �6
- .Power: 95 �130 W
- .Two threads per core
- .3-level cache, 12 MB L3 cache
- .Price: $300 -$1000
-
8Other Technology Trends
- �DRAM density increases by 40-60% per year, latency hasreduced by 33% in 10 years (the memory wall!), bandwidthimproves twice as fast as latency decreases
- �Disk density improves by 100% every year, latencyimprovement similar to DRAM
- �Emergence of NVRAM technologies that can provide abridge between DRAM and hard disk drives
-
9Measuring Performance
- �Two primary metrics: wall clock time (response time for aprogram) and throughput (jobs performed in unit time)
- �To optimize throughput, must ensure that there is minimalwaste of resources
- �Performance is measured with benchmark suites: acollection of programs that are likely relevant to the user
- .SPEC CPU 2006: cpu-oriented programs (for desktops)
- .SPECweb, TPC: throughput-oriented (for servers)
- .EEMBC: for embedded processors/workloads
-
-
-
�Consider 25 programs from a benchmark set �how dowe capture the behavior of all 25 programs with asingle number?
- P1 P2 P3Sys-A10 8 25Sys-B12 9 20Sys-C8 8 30
- .Total (average) execution time
- .Total (average) weighted execution timeor Average of normalized execution times
- .Geometric mean of normalized execution times
-
-
AM Example
- �We fixed a reference machine X and ran 4 programsA, B, C, D on it such that each program ran for 1 second
- �The exact same workload (the four programs executethe same number of instructions that they did on
- machine X) is run on a new machine Y and theexecution times for each program are 0.8, 1.1, 0.5, 2
- �With AM of normalized execution times, we can concludethat Y is 1.1 times slower than X �perhaps, not for allworkloads, but definitely for one specific workload (whereall programs run on the ref-machine for an equal #cycles)
- �With GM, you may find inconsistencies
-
12GM ExampleComputer-AComputer-B Computer-CP1 1 sec 10 secs 20 secsP2 1000 secs 100 secs 20 secsConclusion with GMs: (i) A=B
- (ii) C is ~1.6 times faster
- �For (i) to be true, P1 must occur 100 times for everyoccurrence of P2
- �With the above assumption, (ii) is no longer trueHence, GM can lead to inconsistencies
-
13Summarizing Performance
- �GM: does not require a reference machine, but doesnot predict performance very well
- .So we multiplied execution times and determined
-
that sys-A is 1.2x faster�but on what workload?
-
�AM: does predict performance for a specific workload,
- but that workload was determined by executingprograms on a reference machine
- .Every year or so, the reference machine will have
-
-
-
Normalized Execution Times
-
�Advantage of GM: no reference machine required
- �Disadvantage of GM: does not represent any �real entity�
- and may not accurately predict performance
- �Disadvantage of AM of normalized: need weights (whichmay change over time)
- �Advantage: can represent a real workload
-
-
CPU Performance Equation
- �Clock cycle time = 1 / clock speed
- �CPU time = clock cycle time x cycles per instruction xnumber of instructions
- �Influencing factors for each:
- .clock cycle time: technology and pipeline
- .CPI: architecture and instruction set design
- .instruction count: instruction set design and compiler
-
�CPI (cycles per instruction) or IPC (instructions per cycle)
can not be accurately estimated analytically
-
16Measuring System CPI
- �Assume that an architectural innovation only affects CPI
- �For 3 programs, base CPIs: 1.2, 1.8, 2.5CPIs for proposed model: 1.4, 1.9, 2.3
- �What is the best way to summarize performance with asingle number? AM, HM, or GM of CPIs?
-
17Example
- �AM of CPI for base case = 1.2 cyc+ 1.8 cyc+ 2.5 cyc /3instr instr instr5.5 cycles is execution time if each program ran forone instruction �therefore, AM of CPI defines aworkload where every program runs for an equal #instrs
- �HM of CPI = 1 / AM of IPC ; defines a workload whereevery program runs for an equal number of cycles
- �GM of CPI: warm fuzzy number, not necessarilyrepresenting any workload
-
-
-
��Speedup� is a ratio
- ��Improvement�, �Increase�, �Decrease� usually refer topercentage relative to the baseline
- �A program ran in 100 seconds on my old laptop and in 70seconds on my new laptop
- .What is the speedup?
- .What is the percentage increase in performance?
- .What is the reduction in execution time?
-
-
|
|