Pipelining

From CPUdev wiki
Jump to navigation Jump to search

A limiting factor in CPU performance is transistor propagation delay: the amount of time it takes for a signal to traverse from start to end. Hence, reducing the delay before a signal can hit a storage cell allows increasing clock speed.

For example, take the time profile a naive CPU design which executes a single instruction per cycle, at 0.33MHz:

             0µs    1µs    2µs    3µs    4µs    5µs    6µs    7µs    8µs    9µs
add x1,x5    |--------------------|
ror x2,x4                         |--------------------|
xor x1,x3                                              |--------------------|

By splitting instruction fetch, decode, and execute into separate stages we might be able to increase the clock rate to 1.00MHz:

             0µs    1µs    2µs    3µs    4µs    5µs    6µs    7µs    8µs    9µs
add x1,x5    |--IF--|--ID--|--EX--|
ror x2,x4                         |--IF--|--ID--|--EX--|
xor x1,x3                                              |--IF--|--ID--|--EX--|

Note that the fetch, decode and execute stages are independent. We can overlap these stages...:

             0µs    1µs    2µs    3µs    4µs    5µs    6µs    7µs    8µs    9µs
add x1,x5    |--IF--|--ID--|--EX--|
ror x2,x4           |--IF--|--ID--|--EX--|
xor x1,x3                  |--IF--|--ID--|--EX--|

... reducing total execution time from 9µs to 5µs!