2. Performance Concepts *

Microprocessor Speed

Techniques built into contemporary processors

Pipelining: The processor moves data or instructions into a conceptual pipe with all stages of the pipe processing simultaneously

Branch Prediction: The processor looks ahead in the instruction code fetched from memory and predicates which branches, or groups of instructions, are likely to be processed next

Superscalar execution: Ability to issue more than one instruction in every processor clock cycle (multiple parallel pipelines are used)

Data flow analysis: The processor analyzed which instructions are dependent on each other's results, or data to create an optimized schedule of instructions.

Speculative execution: Using branch prediction and data flow analysis, some processors speculatively execute instructions ahead of their actual appearance in the program execution, holding the results in temporary locations, keeping execution engines as busy as possible.

Performance Balance

Adjust the organization and architecture to compensate for the mismatch among the capabilities of the various components

  • Increase the number of bits retrieved at one time by making DRAMs wider rather than deeper and by using wide bus data paths.

  • Reduce the frequency of memory access by incorporating increasingly complex and efficient cache structures between the processor and main memory

  • Change the DRAM interface to make it more efficient by including a cache or other buffering scheme on the DRAM chip

  • Increase the interconnect bandwidth between processors and memory by using higher speed buses and a hierarchy of buses to buffer and structure data flow

Improvements in Chip Organization and Architecture

  1. Increase hardware speed of the processor

    • Fundamentally due to shrinking logic gate size

      • More gates, packed more tightly, increasing clock rate

      • Propagation time for signals reduced

  2. Increase the size and speed of caches

    • Dedicating port of processor chip

      • Cache access times drop significantly

  3. Change processor organization and architecture

    1. Increase the effective speed of instruction execution

    2. Parallelism

Problems with Clock Speed and Logic Density

Power

  • Power density increases with the density of logic and clock speed

  • Dissipating heat

RC Delay

  • Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them

  • Delay increases as the RC product increases

  • As components on the chip decrease in size, the wire interconnects become thinner, increasing resistance

  • Wires closer together, increasing capacitance

Memory latency and throughput

  • Memory access speed (latency) and transfer speed (throughput) lag processor speeds

Multi-Core

The use of multiple processors on the same chip provides the potential to increase performance without increasing the clock rate

Strategy is to use two simpler processors on the chip rather than one more complex processor

With two processors, larger cache is justified

As cache became larger it made performance sense to create two and then three levels on a chip

MIC - Many Integrated Core

Multicore and MIC involves a homogeneous collection of general-purpose processors on a single chip

GPU

2D and 3D graphics

vector processors

Clock

  • Quartz Crystal (Analog) -> A-to-D converter -> System Clock (Digital)

  • A computer clock runs at a constant rate and determines when events take place in hardware

  • The clock cycle time is the amount of time for one clock period to elapse

  • The clock rate (Hz) is the inverse of the clock cycle time (sec)

Clock Cycle Time * Clock Rate = 1

Clock Cycle Time & Clock Rate Convert Calculator

Computing CPU Time / Execution Time

CPU TIME

CPU TIME = CPU CLOCK CYCLEs * CLOCK CYCLE TIME

CPU TIME = CPU CLOCK CYCLEs / CLOCK RATE

INSTRUCTION COUNT

INSTRUCTION COUNT = INSTRUCTIONs / PROGRAM

CPI (CYCLE per INSTRUCTION)

CLOCK CYCLEs / INSTRUCTION

CLOCK CYCLEs

CPU CLOCK CYCLEs = (INSTRUCTIONs / PROGRAM) * (CLOCK CYCLEs / INSTRUCTION)

CPU CLOCK CYCLEs = INSTRUCTION COUNT * CPI

CPU TIME

CPU TIME = INSTRUCTION COUNT * CPI * CLOCK CYCLE TIME

CPU TIME = INSTRUCTION COUNT * CPI / CLOCK RATE

Computing CPI

CPI is the average number of cycles per instruction.

F - frequency

CPI=∑i=1nCPIi×FiCPI = \sum_{i=1}^{n} CPI_{i}\times F_{i}

MIPS / MFLOPS

Marketing metrics for computer performance included MIPS and MFLOPS

MIPS

Millions of instructions per second

  • MIPS = INSTRUCTION COUNT / (EXECUTION TIME * 10^6)

  • Advantage: easy to understand and measure

  • Disadvantage: may not reflect actual performance, since simple instructions do better

MFLOPS

Millions of floating-point operation per second

  • MFLOPS = FLOATING POINT OPERATIONS / (EXECUTION TIME * 10^6)

  • Advantage: easy to understand and measure

  • Disadvantage: only measure floating point

Benchmark Principles

  • Written in a high-level language - portable across different machines

  • Representative of a particular kind of programming domain or paradigm

  • Easily measured

  • Wide distribution

Last updated

Was this helpful?