Pipelining
Jump to navigation
Jump to search
A limiting factor in CPU performance is transistor propagation delay: the amount of time it takes for a signal to traverse from start to end. Hence, reducing the delay before a signal can hit a storage cell allows increasing clock speed.
For example, take the time profile a naive CPU design which executes a single instruction per cycle, at 0.33MHz:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--------------------|
ror x2,x4 |--------------------|
xor x1,x3 |--------------------|
By splitting instruction fetch, decode, and execute into separate stages we might be able to increase the clock rate to 1.00MHz:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--IF--|--ID--|--EX--|
ror x2,x4 |--IF--|--ID--|--EX--|
xor x1,x3 |--IF--|--ID--|--EX--|
Note that the fetch, decode and execute stages are independent. We can overlap these stages...:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--IF--|--ID--|--EX--|
ror x2,x4 |--IF--|--ID--|--EX--|
xor x1,x3 |--IF--|--ID--|--EX--|
... reducing total execution time from 9µs to 5µs!