Difference between revisions of "Pipelining"
Jump to navigation
Jump to search
(Draft) |
(Fix example clockrate) |
||
Line 2: | Line 2: | ||
Hence, reducing the delay before a signal can hit a storage cell allows increasing clock speed. | Hence, reducing the delay before a signal can hit a storage cell allows increasing clock speed. | ||
For example, take the time profile a naive CPU design which executes a single instruction per cycle, at 0. | For example, take the time profile a naive CPU design which executes a single instruction per cycle, at 0.33MHz: | ||
<source> | <source> | ||
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs | 0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs | ||
Line 9: | Line 9: | ||
xor x1,x3 |--------------------| | xor x1,x3 |--------------------| | ||
</source> | </source> | ||
By splitting instruction fetch, decode, and execute into separate stages we might be able to increase the clock rate to 1. | By splitting instruction fetch, decode, and execute into separate stages we might be able to increase the clock rate to 1.00MHz: | ||
<source> | <source> | ||
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs | 0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs |
Latest revision as of 19:39, 19 March 2025
A limiting factor in CPU performance is transistor propagation delay: the amount of time it takes for a signal to traverse from start to end. Hence, reducing the delay before a signal can hit a storage cell allows increasing clock speed.
For example, take the time profile a naive CPU design which executes a single instruction per cycle, at 0.33MHz:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--------------------|
ror x2,x4 |--------------------|
xor x1,x3 |--------------------|
By splitting instruction fetch, decode, and execute into separate stages we might be able to increase the clock rate to 1.00MHz:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--IF--|--ID--|--EX--|
ror x2,x4 |--IF--|--ID--|--EX--|
xor x1,x3 |--IF--|--ID--|--EX--|
Note that the fetch, decode and execute stages are independent. We can overlap these stages...:
0µs 1µs 2µs 3µs 4µs 5µs 6µs 7µs 8µs 9µs
add x1,x5 |--IF--|--ID--|--EX--|
ror x2,x4 |--IF--|--ID--|--EX--|
xor x1,x3 |--IF--|--ID--|--EX--|
... reducing total execution time from 9µs to 5µs!